Igor R. Shafarevich - Alexey 0. Remizov 


Linear Algebra 
and Geometry 


¥) Springer 


Linear Algebra and Geometry 


Igor R. Shafarevich - Alexey O. Remizov 


Linear Algebra 
and Geometry 


Translated by David Kramer and Lena Nekludova 


D) Springer 


Igor R. Shafarevich Alexey O. Remizov 


Steklov Mathematical Institute CMAP 

Russian Academy of Sciences Ecole Polytechnique CNRS 
Moscow, Russia Palaiseau Cedex, France 
Translators: 

David Kramer 


Lancaster, PA, USA 


Lena Nekludova 
Brookline, MA, USA 


The original Russian edition was published as “Linejnaya algebra i geometriya” by Fizmatlit, 
Moscow, 2009 


ISBN 978-3-642-30993-9 ISBN 978-3-642-30994-6 (eBook) 
DOI 10.1007/978-3-642-30994-6 
Springer Heidelberg New York Dordrecht London 


Library of Congress Control Number: 2012946469 
Mathematics Subject Classification (2010): 15-01, 51-01 


© Springer-Verlag Berlin Heidelberg 2013 

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of 
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, 
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information 
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology 
now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection 
with reviews or scholarly analysis or material supplied specifically for the purpose of being entered 
and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of 
this publication or parts thereof is permitted only under the provisions of the Copyright Law of the 
Publisher’s location, in its current version, and permission for use must always be obtained from Springer. 
Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations 
are liable to prosecution under the respective Copyright Law. 

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication 
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant 
protective laws and regulations and therefore free for general use. 

While the advice and information in this book are believed to be true and accurate at the date of pub- 
lication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any 
errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect 
to the material contained herein. 


Printed on acid-free paper 


Springer is part of Springer Science+Business Media (www.springer.com) 


Preface 


This book is the result of a series of lectures on linear algebra and the geometry of 
multidimensional spaces given in the 1950s through 1970s by Igor R. Shafarevich 
at the Faculty of Mechanics and Mathematics of Moscow State University. 

Notes for some of these lectures were preserved in the faculty library, and these 
were used in preparing this book. We have also included some topics that were 
discussed in student seminars at the time. All the material included in this book is 
the result of joint work of both authors. 

We employ in this book some results on the algebra of polynomials that are 
usually taught in a standard course in algebra (most of which are to be found in 
Chaps. 2 through 5 of this book). We have used only a few such results, without 
proof: the possibility of dividing one polynomial by another with remainder; the 
theorem that a polynomial with complex coefficients has a complex root; that every 
polynomial with real coefficients can be factored into a product of irreducible first- 
and second-degree factors; and the theorem that the number of roots of a polynomial 
that is not identically zero is at most the degree of the polynomial. 

To provide a visual basis for this course, it was preceded by an introductory 
course in analytic geometry, to which we shall occasionally refer. In addition, some 
topics and examples are included in this book that are not really part of a course in 
linear algebra and geometry but are provided for illustration of various topics. Such 
items are marked with an asterisk and may be omitted if desired. 

For the convenience of the reader, we present here the system of notation used 
in this book. For vector spaces we use sans serif letters: L,M,N,...; for vectors, 
we use boldface italics: x, y,z,...; for linear transformations, we use calligraphic 
letters: A, B,C,...; and for the corresponding matrices, we use uppercase italic 
letters: A, B,C,.... 


Acknowledgements 


The authors are grateful to M.I. Zelinkin, D.O. Orlov, and Ya.V. Tatarinov for read- 
ing parts of an earlier version of this book and making a number of useful sugges- 


Vv 


vi Preface 


tions and remarks. The authors are also deeply grateful to our editor, S. Kuleshov, 
who gave the manuscript a very careful reading. His advice resulted in a number 
of important changes and additions. In particular, some parts of this book would 
not have appeared in their present form had it not been for his participation in 
this project. We would also like to offer our hearty thanks to the translators, David 
Kramer and Lena Nekludova, for their English translation and in particular for cor- 
recting a number of inaccuracies and typographical errors that were present in the 
Russian edition of this book. 


Contents 


1 Linear Equations ................. 0.0.00. 2 000. 1 
1.1 Linear Equations and Functions ................... 1 
1.2 Gaussian Elimination ...................0000. 6 
123) JExamples® ic ein go a Be Bet oh Eek oe ed bs Sa dS 15 
2  Matricesand Determinants .....................-.. 25 
2.1 Determinants of Orders2and3 ................0.. 25 
2.2 Determinants of Arbitrary Order ..............00.. 30 
2.3. Properties that Characterize Determinants. ............. 37 
2.4 Expansion of a Determinant Along ItsColumns .......... 39 
2.5: (Cramer's RUG <3: 3.2 Gig) Ba ree Ra Soe BYR oe S 42 
2.6 Permutations, Symmetric and Antisymmetric Functions ..... . 44 
2.7 Explicit Formula for the Determinant ................ 50 
2.8 The Rank ofa Matrix ...................000.0. 53 
2.9 Operations on Matrices ...................000. 60 
2.10 Inverse Matrices... 2... ee ee 70 
3: ‘Vector Spaces: + 2:.2<5 4945 cater hme ewe ek ha eee ee 79 
3.1 The Definition ofa VectorSpace ...............0.. 79 
3.2 Dimensionand Basis ..................-.00008. 86 
3.3 Linear Transformations of Vector Spaces .............. 101 
3.4 Change of Coordinates... ..............0....000. 107 
3.5 Isomorphisms of Vector Spaces ................20. 112 
3.6 The Rank ofa Linear Transformation. ............... 118 
3:7. Dulal:Spaces: hss. d.208. ete Bs gee! Eee be a ee OS 120 
3.8 Forms and Polynomials in Vectors ................. 127 
4 Linear Transformations of a Vector Space toItself .......... 133 
4.1 Eigenvectors and Invariant Subspaces................, 133 
4.2 Complex and Real Vector Spaces ................0.., 142 
4.3. Complexification............. 0.0... .....0000, 149 
4.4 Orientation ofa Real VectorSpace ................., 154 


viii 


10 


11 


Contents 
Jordan Normal Form. .......................2., 161 
5.1. Principal Vectors and Cyclic Subspaces.............20.. 161 
5.2. Jordan Normal Form (Decomposition) ............... 165 
5.3. Jordan Normal Form (Uniqueness) ................. 169 
5.4 Real Vector Spaces... 2... 2.2... . 2.2... 0.0.00 000. 173 
5.9 Applications* a. for ee. alee a: eee mk eo ess, Ge 176 
Quadratic and Bilinear Forms. ..................04. 191 
6.1 Basic Definitions. .......................000. 191 
6.2 Reduction to Canonical Form .................0.0. 198 
6.3. Complex, Real, and Hermitian Forms ................ 204 
Euclidean Spaces ..............0. 20.0000 eee ee 213 
7.1 The Definition of aEuclidean Space ................ 213 
7.2 Orthogonal Transformations... ................0.0. 223 
7.3 Orientation of a Euclidean Space*... 2.2.2.2... ....0020. 230 
WA JExamples* son vb Gi Bsr ie, Spear ea Ge antes BA egw & 233 
7.5 Symmetric Transformations ..................0.-. 245 
7.6 Applications to Mechanics and Geometry* ............. 255 
7.7 Pseudo-Euclidean Spaces ... 2... .........2.2.00.0. 265 
7.8 Lorentz Transformations. ............... 0020 2000- 275 
Affine: Spaces: 3..¢. 2g. 4 sa ek del a. a Pe ee Heke 289 
8.1 The Definition of an Affine Space... 2... ....0..0.000. 289 
8.2 Affine’Spaces: <2... sek eee ee ee sees 294 
8.3 Affine Transformations ...................000. 301 
8.4 Affine Euclidean Spaces and Motions................ 309 
Projective Spaces .............0. 02002 eee eee eee 319 
9.1 Definition of a Projective Space... .............0.0, 319 
9.2 Projective Transformations ...................0. 328 
9:3. “The Cross Ratio 2-4-5:3- Se Bh eR eee Mb ee ods 335 
9.4 Topological Properties of Projective Spaces* ..........2.. 339 
The Exterior Product and Exterior Algebras ............. 349 
10.1 Pliicker Coordinates ofa Subspace ...............0.. 349 
10.2 The Pliicker Relations and the Grassmannian............ 353 
10:3. The. Exterior Product «2 23. 2 scp. ek ee a a a 358 
10.4: Exterior Algebras* 3 3. ec0 0. ape A ee a ae de 367 
10.5-Appendix*. s....2.25. 2.0% 425 eGa sta dete oe eee os 374 
QUaAdQieS:5 52 ae Se we eae Ree Rete ee 385 
11.1 Quadrics in Projective Space ................00.0. 385 
11.2 Quadrics in Complex Projective Space ............0.. 394 
11.3 Isotropic Subspaces ................0.02. 02.000. 398 
11.4 Quadrics in a Real Projective Space... ...........0020. 410 
11.5 Quadrics ina Real Affine Space... 2.2.2.2... ....0.00.0. 414 
11.6 Quadrics in an Affine Euclidean Space .............0.. 425 


11.7 Quadrics in the Real Plane* ..................... 428 


Contents ix 


12 HyperbolicGeometry ..................002 2000.4 433 
12.1 Hyperbolic Space*.. 2... 2 ee, 434 
12.2 The Axioms of Plane Geometry* ...............0.. 443 
12.3 Some Formulas of Hyperbolic Geometry* ............. 454 
13. Groups, Rings,and Modules.....................-.-. 467 
13.1 Groups and Homomorphisms .................... 467 
13.2 Decomposition of Finite AbelianGroups .............. 475 
13.3 The Uniqueness of the Decomposition ............... 481 
13.4 Finitely Generated Torsion Modules over a Euclidean Ring* . . . . 484 
14 Elements of Representation Theory................... 497 
14.1 Basic Concepts of Representation Theory. ............. 497 
14.2 Representations of FiniteGroups .................. 503 
14.3 Irreducible Representations ..................0.0. 508 
14.4 Representations of AbelianGroups ................. 511 
Historical Noté 3.3.5.4 sean eee fe ewes A oe Sale owe 2 515 
References» 2.e0.225 646-44 G6+8o eee Po a eee ee 517 


Preliminaries 


In this book we shall use a number of concepts from set theory. These ideas appear 
in most mathematics courses, and so they will be familiar to some readers. However, 
we shall recall them here for convenience. 


Sets and Mappings 


A set is a collection of arbitrarily chosen objects defined by certain precisely speci- 
fied properties (for example, the set of all real numbers, the set of all positive num- 
bers, the set of solutions of a given equation, the set of points that form a given 
geometric figure, the set of wolves or trees in a given forest). If a set consists of 
a finite number of elements, then it is said to be finite, and if not, it is said to be 
infinite. We shall employ standard notation for certain important sets, denoting the 
set of natural numbers by N, the set of integers by Z, the set of rational numbers by 
Q, the set of real numbers by R, and the set of complex numbers by C. The set of 
natural numbers not exceeding a given natural number n, that is, the set consisting 
of 1,2,...,”, will be denoted by N,,. The objects that make up a set are called its 
elements or sometimes points. If x is an element of the set M, then we shall write 
x € M. If we need to specify that x in not an element of M, then we shall write 
x€éM. 

A set S consisting of certain elements of the set M (that is, every element of the 
set S is also an element of the set M) is called a subset of M. We write S Cc M. 
For example, N, Cc N for arbitrary n, and likewise, we have NC Z,ZCQ,QCR, 
and Rc C. A subset of M consisting of elements x, € M (where the index @ runs 
over a given finite or infinite set) will be denoted by {x}. It is convenient to include 
among the subsets of a set M the set that contains no elements at all. We call this 
set the empty set and denote it by ©. 

Let M and N be two arbitrary sets. The collection of all elements that belong si- 
multaneously to both M and N is called the intersection of M and N and is denoted 
by MON. If we have MM N = @, then we say that the sets M and N are disjoint. 


Xi 


xii Preliminaries 


The collection of elements belonging to either M or N (or to both) is called the 
union of M and N and is denoted by M UN. Finally, the set of elements that belong 
to M but do not belong to N is called the complement of N in M and is denoted by 
M\N. 

We say that a set M has an equivalence relation defined on it if for every pair of 
elements x and y of M, either the elements x and y are equivalent (in which case 
we write x ~ y) or they are inequivalent (x ~# y), and if in addition, the following 
conditions are satisfied: 


1. Every element of M is equivalent to itself: x ~ x (reflexivity). 
2. Ifx ~ y, then y ~ x (symmetry). 
3. Ifx ~ y and y ~ z, then x ~ z (transitivity). 


If an equivalence relation is defined on a set M, then M can be represented as the 
union of a (finite or infinite) collection of sets My called equivalence classes with 
the following properties: 


(a) Every element x € M is contained in one and only one equivalence class My. 
In other words, the sets My are disjoint, and their union (finite or infinite) is the 
entire set M. 

(b) Elements x and y are equivalent (x ~ y) if and only if they belong to the same 
subset My. 


Clearly, the converse holds as well: if we are given a representation of a set M 
as the union of subsets My satisfying property (a), then setting x ~ y if (and only 
if) these elements belong to the same subset M,, we obtain an equivalence relation 
on M. 

From the above reasoning, it is clear that the equivalence thus defined is com- 
pletely abstract; there is no indication as to precisely how it is decided whether two 
elements x and y are equivalent. It is necessary only that conditions | through 3 
above be satisfied. Therefore, on a particular set M one can define a wide variety of 
equivalence relations. 

Let us consider a few examples. Let the set M be the natural numbers, that is, 
M =N. Then on this set it is possible to define an equivalence relation defined by 
the condition that x ~ y if x and y have the same remainder on division by a given 
natural number n. It is clear that conditions | through 3 above are satisfied, and 
N can be represented as the union of 7 classes (in the case n = 1, all the natural 
numbers are equivalent to each other and so there is only one class; if n = 2, there 
are two classes, namely the even numbers and the odd numbers; and so on). Now let 
M be the set of points in the plane or in space. We can define an equivalence relation 
by the rule that x ~ y if the points x and y are the same distance from a given fixed 
point O. Then the equivalence classes are all circles (in the case of the plane) or 
spheres (in space) with center at O. If, on the other hand, we wanted to consider 
two points equivalent if the distance between them is some given number, then we 
would not have an equivalence relation, since transitivity would not be satisfied. 

In this book, we shall encounter several types of equivalence relations (for exam- 
ple, on the set of square matrices). 


Preliminaries Xiii 


A mapping from a set M into a set N is a rule that assigns to every element 
of the set M a particular element of N. For example, if M is the set of all bears 
currently alive on Earth and N is the set of positive numbers, then assigning to each 
bear its weight (for example in kilograms) constitutes a mapping from M to N. We 
shall call such mappings of a set M into N functions on M with values in N. We 
shall usually denote such an assignment by one of the letters f, g,... or F,G,.... 
Mappings from a set M into a set N are indicated with an arrow and are written thus: 
f:M-— N. Anelement y € N assigned to an element x € M is called the value of 
the function f at the point x. This is written using an arrow with a tail, f: xb y, 
or the equality y = f(x). Later on, we shall frequently display mappings between 
sets in the form of a diagram: 

M ig N. 

If the sets M and N coincide, then f : M— M is called a mapping of M into 
itself. A mapping of a set into itself that assigns to each element x that same element 
x is called an identity mapping. It will be denoted by the letter e, or if it is important 
to specify the underlying set M, by ey. Thus in our notation, we have ey : M— M 
and ey (x) = x for every x € M. 

A mapping f : M — N is called an injection or an injective mapping if different 
elements of the set M are assigned different elements of the set N, that is, it is 
injective if f (x1) = f (x2) always implies x; = x2. 

If S is a subset of N and f : M — N is a mapping, then the collection of all 
elements x € M such that f(x) € S is called the preimage or inverse image of S 
and is denoted by f~!(S). In particular, if S consists of a single element y € N, 
then f—!(S) is called the preimage or inverse image of the element y and is writ- 
ten f—!(y). Using this terminology, we may say that a mapping f : M > N is 
an injection if and only if for every element y € N, its inverse image f—!(y) con- 
sists of at most a single element. The words “at most” imply that certain elements 
y € N may have an empty preimage. For example, let M = N = R and suppose 
the mapping f assigns to each real number x the value f(x) = arctanx. Then f is 
injective, since the inverse image f~!(y) consists of a single element if | y| < 5 and 
is the empty set if |y| > 5. 

If S is a subset of M and f : M — N is a mapping, then the collection of all 
elements y € N such that y = f(x) for some x € S is called the image of the subset 
S and is denoted by f(S). In particular, the subset S' could be the entire set M, in 
which case f(M) is called the image of the mapping f. We note that the image of 
f does not have to consist of the entire set N. For example, if M = N = R and 
f is the squaring operation (raising to the second power), then f(M) is the set of 
nonnegative real numbers and does not coincide with the set R. 

If again S is a subset of M and f : M — N a mapping, then applying the map- 
ping only to elements of the set S defines a mapping f : S > N, called the restric- 
tion of the mapping f to S. In other words, the restriction mapping is defined by 
taking f(x) for each x € S as before and simply ignoring all x ¢ S. Conversely, if 
we start off with a mapping f : S — N defined only on the subset S, and then some- 
how define f(x) for the remaining elements x € M \ S, then we obtain a mapping 
f:M-— N, called an extension of f to M. 


xiv Preliminaries 


A mapping f : M — N is bijective or a bijection if it is injective and the image 
f (M) is the entire set NV, that is, f() = N. Equivalently, a mapping is a bijection 
if for each element y € N, there exists precisely one element x € M such that y = 
f (x).! In this case, it is possible to define a mapping from N into M that assigns to 
each element y € N the unique element x € M such that f(x) = y. Such a mapping 
is called the inverse of f and is denoted by f~! : N > M. Now suppose we are 
given sets M, N, L and mappings f : M— N and g: N > L, which we display in 
the following diagram: 


Mim he 7 (1) 


Then application of f followed by g defines a mapping from M to L by the obvious 
tule: first apply the mapping f : M — N, which assigns to each element x € M an 
element y € N, and then apply the mapping g: N — L that takes an element y to 
some element z € L. We thus obtain a mapping from M to L called the composition 
of the mappings f and g, written go f or simply gf. Using this notation, the 
composition mapping is defined by the formula 


(go f(x) =8(f(x)) (2) 


for an arbitrary x € M. We note that in equation (2), the letters f and g that denote 
the two mappings appear in the reverse order to that in the diagram (1). As we shall 
see later, such an arrangement has a number of advantages. 

As an example of the composition of mappings we offer the obvious equalities 


enof=f, foem=f, 


valid for any mapping f : M — N, and likewise the equalities 


foftsey, f-lof=em, 


which are valid for any bijective mapping f: M—> N. 

The composition of mappings has an important property. Suppose that in addition 
to the mapping shown in diagram (1), we have as well a mapping h: L — K, where 
K is an arbitrary set. Then we have 


ho(gof)=(hog)of. (3) 


The truth of this claim follows at once from the definitions. First of all, it is apparent 
that both sides of equation (3) contain a mapping from M to K. Thus we need to 
show that when applied to any element x € M, both sides give the same element of 
the set K. According to definition (2), for the left-hand side of (3), we obtain 


ho(go f(x) =h((g 0 f)(x)), (go f)(x) = g(f(x)). 


!Translator’s note: The term one-to-one is also used in this context. However, its use can be con- 
fusing: an injection is sometimes called a one-to-one mapping, while a bijection is sometimes 
called a one-to-one correspondence. In this book, we shall strive to stick to the terms injective and 
bijective. 


Preliminaries XV 


Substituting the second equation into the first, we finally obtain ho(gof)(x) = 
h(g(f (x))). Analogous reasoning shows that we obtain precisely the same expres- 
sion for the right-hand side of equation (3). 

The property expressed by formula (3) is called associativity. Associativity plays 
an important role, both in this course and in other branches of mathematics. There- 
fore, we shall pause here to consider this concept in more detail. For the sake of 
generality, we shall consider a set M of arbitrary objects (they can be numbers, 
matrices, mappings, and so on) on which is defined the operation of multiplication 
associating two elements a € M and b € M with some element ab € M, which we 
call the product, such that it possesses the associative property: 


(ab)c =a(be). (4) 


The point of condition (4) is that without it, we can calculate the product of ele- 
ments a1,...,@m for m > 2 only if the sequence of multiplications is indicated by 
parentheses, indicating which pairs of adjacent elements we are allowed to multiply. 
For example, with m = 3, we have two possible arrangements of the parentheses: 
(a,az)a3 and a; (a2a3). For m = 4 we have five variants: 


((a1a2)a3)a4, (a1(a2a3))aa, (a1 a2)(a3a4), 
ay ((aza3)a4), ay (az (a3a4)), 


and so on. It turns out that if for three factors (m = 3), the product does not depend 
on how the parentheses are ordered (that is, the associative property is satisfied), 
then it will be independent of the arrangement of parentheses with any number of 
factors. 

This assertion is easily proved by induction on m. Indeed, let us suppose that 
it is true for all products of m or fewer elements, and let us consider products 
of m+ 1 elements a1,...,@m,@m+1 for all possible arrangements of parenthe- 
ses. It is easily seen that in this case, there are two possible alternatives: ei- 
ther there is no parenthesis between elements a,, and dm+41, or else there is one. 
Since by the induction hypothesis, the assertion is correct for a),...,@m, then in 
the first case we obtain the product (a1 ---G@m—1)(@m@m+1), while in the second 
case, we have (a ---dm)Gm+1 = (1 -+-Am—1)Gm)am+1. Introducing the notation 
a =|-+-Am—|, b = am, and c = dm+1, we obtain the products a(bc) and (ab)c, 
the equality of which follows from property (4). 

In the special case aj = --- = dm =a, the product aj --- dp» is denoted by a” and 
is called the mth power of the element a. 

There is another important concept connected to the composition of mappings. 

Let R be a given set. We shall denote by §(M, R) the collection of all map- 
pings M — R, and analogously, by §(N, R) the collection of all mappings N > R. 
Then with every mapping f : M — N is associated the particular mapping f* : 
S(N, R) > §(M, R), called the dual to f and defined as follows: For every map- 
ping g € §(N, R) it assigns the mapping f*(g) € §(M, R) according to the formula 


f*(~)=9of. (5) 


Xvi Preliminaries 


Formula (5) indicates that for an arbitrary element x € M, we have the equality 
ft *(~)(x) = Go f(x), which can also be expressed by the following diagram: 


M 
“e 


f R 
ye 
N 


Here we become acquainted with the following general mathematical fact: Func- 
tions are written in reverse order in comparison with the order of the sets on which 
they are defined. This phenomenon will appear in our book, as well as in other 
courses in relationship to more complex objects (such as differential forms). 

The dual mapping f* possesses the following important property: If we have 
mappings of sets, as depicted in diagram (1), then 


(go fy" = f* og". (6) 
Indeed, we obtain the dual mappings 


B(L, R) > B(N,R) > FM, R). 

By definition, for go f : M —> L, the dual mapping (g o f)* is a mapping from 
go(L, R) into §(M, R). As can be seen from (2), f* o g* is also a mapping of the 
same sets. It remains for us to show that (g o f)* and f* 0 g* take every element 
w € §(L, R) to one and the same element of the set §(M, R). By (5), we have 


(go f)"(W)=Wo(gof). 


Analogously, taking into account (2), we obtain the relationship 


fos WM=af(eiWM)=fWog=Woag)of. 
Thus for a proof of equality (6), it suffices to verify associativity: yo (go f)= 
(Wog)of. 

Up to now, we have considered mappings (functions) of a single argument. The 
definition of functions of several arguments is reduced to this notion with the help 
of the operation of product of sets. 

Let M,,..., M, be arbitrary sets. Consider the ordered collection (x1, ..., Xn), 
where x; is an arbitrary element of the set M;. The word “ordered” indicates that 
in such collections, the order of the sequence of elements x; is taken into account. 
For example, in the case n = 2 and M; = Mp, the pairs (x1, x2) and (x2, x1) are 
considered to be different if x; # x2. A set consisting of all ordered collections 
(x1,.-.,;Xn) is called the product of the sets Mj,..., M, and is denoted by My x 
sane M,. 

In the special case M, =--- = M, = M, the product M, x --- x M,, is denoted 
by M” and is called the nth power of the set M. 

Now we can define a function of an arbitrary number of arguments, each of which 
assumes values from “its own” set. Let M,..., My, be arbitrary sets, and let us 


Preliminaries XVil 


define M = M, x --- x M,. By definition, the mapping f : M — N assigns to 
each element x € M a certain element y € N, that is, it assigns to n elements x; € 
M,...,Xn € My, taken in the assigned order, the element y = f(x1,...,%n) of the 
set N. This is a function of m arguments x;, each of which takes values from “its 
own” set M;. 


Some Topological Notions 


Up to now, we have been speaking about sets of arbitrary form, not assuming that 
they possess any additional properties. Generally, that will not suffice. For example, 
let us assume that we wish to compare two geometric figures, in particular, to deter- 
mine the extent to which they are or are not “alike.” Let us consider the two figures 
to be sets whose elements are points in a plane or in space. If we wish to limit our- 
selves to the concepts introduced above, then it is natural to consider “alike” those 
sets between which there exists a bijection. However, toward the end of the nine- 
teenth century, Georg Cantor demonstrated that there exists a bijection between the 
points of a line segment and those of the interior of a square.” At the same time, 
Richard Dedekind conjectured that our intuitive idea of “alikeness” of figures is 
connected with the possibility of establishing between them a continuous bijection. 
But for that, it is necessary to define what it means for a mapping to be continuous. 

The branch of mathematics in which one studies continuous mappings of abstract 
sets and considers objects with a precision only up to bijective continuous mappings 
is called topology. Using the words of Hermann Weyl, we may say that in this book, 
“the mountain range of topology will loom on the horizon.” More precisely, we 
shall introduce some topological notions only now and then, and then only the sim- 
plest ones. We shall formulate them now, but we shall appeal to them seldom, and 
only to indicate a connection between the objects that we are considering with other 
branches of mathematics to which the reader may be introduced in more detail in 
other courses or textbooks. Such instances can be read or passed over as desired; 
they will not be used in the remainder of the book. To define a continuous mapping 
f :M — N itis necessary first to define the notion of convergence on the sets M 
and N. In some cases, we will define convergence on sets (for example, in spaces 
of vectors, spaces of matrices, or projective spaces), based on the notion of conver- 
gence in R and C, which is assumed to be familiar to the reader from a course in 
calculus. In other cases, we shall make use of the notion of metric. 

A set M is called a metric space if there exists a function r : M? — R assign- 
ing to every pair of points x, y €¢ M a number r(x, y) that satisfies the following 
conditions: 


1. r(x, y) > O for x # y, and r(x, x) =0, for every x, ye M. 


>This result so surprised him, that as Cantor wrote in a letter, he believed for a long time that it was 
incorrect. 


XVili Preliminaries 


2. r(x, y)=r(y,x) for every x, ye M. 
3. For any three points x, y, z € M one has the inequality 


r(x,z) <r, y)+ry, 2). (7) 


Such a function r(x, y) is called a metric or distance on M, and the properties 
enumerated in its definition constitute an axiomatization of the usual properties of 
distance known from courses in elementary or analytic geometry. 

For example, the set R of all real numbers (and also any subset of it) becomes 
a metric space if for every pair of numbers x and y we introduce the function 
r(x, y) = |x — yl orr(x, y) = |x — yl. 

For an arbitrary metric space there is automatically defined the notion of conver- 
gence of points in the space: a sequence of points x, converges to the point x as 
k — oo (notation: x, — x) if r(xz, x) > 0 as k > oo. The point x in this case is 
called the limit of the sequence xx. 

Let X C M be some subset of M, and M a metric space with the metric r(x, y), 
that is, a mapping r : M* — R satisfying the three properties given above. It is clear 
that the restriction of r(x, y) to the subset X* C M? also satisfies those properties, 
and hence it defines a metric on X. We say that X is a metric space with the metric 
induced by the metric of the enclosing space M or that X C M is a metric subspace. 

The subset X is said to be closed in M if it contains the limit point of every 
convergent sequence in X, and it is said to be bounded if there exist a point x € X 
and a number c > 0 such that r(x, y) <c forall ye X. 

Let M and N be sets on each of which is defined the notion of convergence (for 
example, M and N could be metric spaces). A mapping f : M — N is said to be 
continuous at the point x € M if for every convergent sequence x, — x of points 
in the set M, one has f(x.) > f(x). If the mapping f : M — N is continuous at 
every point x € M, then we say that it is continuous on the set M or simply that it is 
continuous. 

The mapping f : M — N is called a homeomorphism if it is injective with an 
injective inverse mapping f~!: N > M, both of which are continuous.° The sets 
M and N are said to be homeomorphic or topologically equivalent if there exists 
a homeomorphism f : M — N. It is easily seen that the property among sets of 
being homeomorphic (for a given fixed definition of convergence) is an equivalence 
relation. 

Given two infinite sets M@ and N on which no metrics have initially been defined, 
if we then supply them with metrics using first one definition and then another, we 
will obtain differing notions of homeomorphism f : M — N, and it can turn out 
that in one type of metric, M and N are homeomorphic, while in another type they 
are not. For example, on arbitrary sets M and N let us define what is called the 
discrete metric, defined by the relations r(x, y) = 1 for all x ~£ y and r(x,x) =0 
for all x. It is clear that with such a definition, all the properties of a metric are 


3We wish to emphasize that this last condition is essential: from the continuity of f one may not 
conclude the continuity of f~!. 


Preliminaries XiX 


A-LI-O 8+ 


Fig. 1 Homeomorphic and nonhomeomorphic curves (the symbol ~ means that the figures are 
homeomorphic, while # means that they are not) 


satisfied, but the notion of homeomorphism f : M — N becomes empty: it simply 
coincides with the notion of bijection. For indeed, in the discrete metric, a sequence 
x, converges to x if beginning with some index k, all the points x, are equal to x. 
As follows from the definition of continuous mapping given above, this means that 
every mapping f : M — N is continuous. 

For example, according to a theorem of Cantor, a line segment and a square are 
homeomorphic under the discrete metric, but if we consider them, for example, as 
metric spaces in the plane on which distance is defined as in a course in elementary 
geometry (let us say using the system of Cartesian coordinates), then the two sets 
are no longer homeomorphic. 

This shows that the discrete metric fails to reflect some important properties of 
distance with which we are familiar from courses in geometry, one of which is that 
for an arbitrarily small number ¢ > 0, there exist two distinct points x and y for 
which r(x, y) < e. Therefore, if we are to formulate our intuitive idea of “geomet- 
ric similarity” of two sets M and N, it is necessary to consider them not with an 
arbitrary metric, but with a metric that reflects these geometric notions. 

We are not going to go more deeply into this question, since for our purposes that 
is unnecessary. In this book, when we “compare” sets M and N, where at least one 
of them (say NV) is a geometric figure in the plane (or in space), then distance will be 
determined in the usual way, with the metric on N induced by the metric in the plane 
(or in the space) in which it lies. It remains for us to define the metric (or notion of 
convergence) on the set M in such a way that M and N are homeomorphic. That is 
how we shall make precise the idea of comparison. 

If the figures M and N are metric subspaces of the plane or space with distance 
defined as in elementary geometry, then there exists for them a very graphic inter- 
pretation of the concept of topological equivalence. Imagine that figures M and N 
are made out of rubber. Then their being homeomorphic means that we can deform 
M into N without tearing and without gluing together any points. This last condi- 
tion (“without tearing and without gluing together any points”) is what makes the 
notion of homeomorphism much stronger than simply a bijective mapping of sets. 

For example, an arbitrary continuous closed curve without self-intersection (for 
example, a triangle or square) is homeomorphic to a circle. On the other hand, a con- 
tinuous closed curve with self-intersection (say a figure eight) is not homeomorphic 
to a circle (see Fig. 1). 

In Fig. 2 we have likewise depicted examples of homeomorphic and nonhomeo- 
morphic figures, this time in three-dimensional space. 

We conclude by introducing a few additional simple topological concepts that 
will be used in this book. 


XX Preliminaries 


pyramid sphere sphere with torus sphere with 
handle (weight) (doughnut) two handles 


Fig. 2. Homeomorphic and nonhomeomorphic surfaces 


A path in a metric space M is a continuous mapping f : J — M, where I is 
an interval of the real line. Without any loss of generality, we may assume that 
I = [0, 1]. In this case, the points f(0) and f(1) are called the beginning and end 
of the path. Two points x, y € M are said to be continuously deformable into each 
other if there is a path in which x is the beginning and y is the end. Such a path 
is called a deformation of x into y, and we shall notate the fact that x and y are 
deformable into one another by x ~ y. 

The property for elements of a space M to be continuously deformable into one 
another is an equivalence relation on M, since properties | through 3 that define such 
a relation are satisfied. Indeed, the reflexive property is obvious. To prove symmetry, 
it suffices to observe that if f(t) is a deformation of x into y, then f(1 — 1) isa 
deformation of y into x. Now let us verify transitivity. Let x ~ y and y ~ z, f(t) 
a deformation of x into y, and g(t) a deformation of y into z. Then the mapping 
h:I— M determined by the equality h(t) = f(2r) for t € [0, 5] and the equality 
h(t) = g(t — 1) forte [5, 1] is continuous, and for this mapping, the equalities 
h(O) = f(0) =x, hd) = gC) =z are satisfied. Thus A(t) gives the continuous 
deformation of the point x to z, and therefore we have x ~ z. 

If every pair of elements of a metric space M can be deformed one into the other 
(that is, the relationship ~ defines a single equivalence class), then the space M is 
said to be path-connected. If that is not the case, then for each element x « M we 
consider the equivalence class M, consisting of all elements y € M such that x ~ y. 
By the definition of equivalence class, the metric space M, will be path-connected. 
It is called the path-connected component of the space M containing the point x. 
Thus the equivalence relation defined by a continuous deformation decomposes M 
into path-connected components. 

In a number of important cases, the number of components is finite, and we 
obtain the representation M = M, U---U Mx, where Mj 1 Mj = © fori # j and 
each M; is path-connected. It is easily seen that such a representation is unique. The 
sets M; are called the path-connected components of the space M. 

For example, a hyperboloid of one sheet, a sphere, and a cone are each path- 
connected, but a hyperboloid of two sheets is not: it has two path-connected com- 
ponents. The set of real numbers defined by the condition 0 < |x| < 1 has two 
path-connected components (one containing positive numbers; the other, negative 
numbers), while the set of complex numbers defined by the same condition is path- 
connected. The properties preserved by homeomorphisms are called topological 


Preliminaries XXi 


properties. Thus, for example, the property of path-connectedness is topological, 
as is the number of path-connected components. 

Let M and N be metric spaces (let us denote their respective metrics by r and r’). 
A mapping f : M — N is called an isometry if it is bijective and preserves distances 
between points, that is, 


r(x1,x2) =r’ (fr), f 2)) (8) 


for every pair of points x1,.x2 € M. From the relationship (8), it follows automati- 
cally that an isometry is an embedding. Indeed, if there existed points x; ¢ x2 in the 
set M for which the equation f (x1) = f (x2) were satisfied, then from condition | 
in the definition of a metric space, the left-hand side of (8) would be different from 
zero, while the right-hand side would be equal to zero. Therefore, the requirement 
of a bijective mapping is here reduced to the condition that the image of f(M) 
coincide with all of the set NV. 

Metric spaces M and WN are called isometric or metrically equivalent if there ex- 
ists an isometry f : M — N. It is easy to see that an isometry is a homeomorphism 
and generalizes the notion of the motion of a rigid body in space, whereby we can- 
not arbitrarily deform the sets M and N into one another as if they were made of 
rubber (without tearing and gluing). We can only treat them as if they were rigid 
or made of flexible, but not compressible or stretchable, materials (for example, an 
isometry of a piece of paper is obtained by bending it or rolling it up). 

In the plane or in space with distance determined by the familiar methods of el- 
ementary geometry, examples of isometries are parallel translations, rotations, and 
symmetry transformations. Thus, for example, two triangles in the plane are iso- 
metric if and only if they are “equal” (that is, congruent in the sense defined in 
courses in school geometry, namely equality of sides and angles), and two ellipses 
are isometric if and only if they have equal major and minor axes. 

In conclusion, we observe that in the definition of homeomorphism, path- 
connectedness, and path-connected component, the notion of metric played only 
an auxiliary role. We used it to define the notion of convergence of a sequence of 
points, so that we could speak of continuity of a mapping and thereby introduce 
concepts that depend on this notion. It is convergence that is the basic topological 
notion. It can be defined by various metrics, and it can also be defined in another 
way, as is usually done in topology. 


Chapter 1 
Linear Equations 


1.1 Linear Equations and Functions 


In this chapter, we will be studying systems of equations of degree one. We shall 
let the number of equations and number of unknowns be arbitrary. We begin by 
choosing suitable notation. Since the number of unknowns can be arbitrarily large, 
it will not suffice to use the twenty-six letters of the alphabet: x, y,..., z, and so on. 
Therefore, we shall use a single letter to designate all the unknowns and distinguish 
among them with an index, or subscript: x1, x2, ..., Xn, where n is the number of un- 
knowns. The coefficients of our equations will be notated using the same principle, 
and a single equation of the first degree will be written thus: 


aX, +anxXg +++ taynXy, =D. (1.1) 


A first-degree equation is also called a Jinear equation. 

We shall use the same principle to distinguish among the various equations. But 
since we have already used one index for designating the coefficients of the un- 
knowns, we introduce a second index. We shall denote the coefficient of x; in the 
ith equation by ajx. To the right side of the ith equation we attach the symbol b;. 
Therefore, the ith equation is written 


aj1X1 + aj2x2 + +++ + dinXn = bi, (1.2) 
and a system of m equations in n unknowns will look like this: 


Ay1X] + aj2xX2 + +++ + ajnXn = dq, 


a21X| + 22xX2 + +++ + d2nXn = bo, (1.3) 


Ami X1 + Gm2X2 ++++ + amnXn = bm. 


The numbers bj, ..., bm are called the constant terms or just constants of the system 
(1.3). It will sometimes be convenient to focus our attention on the coefficients of 


LR. Shafarevich, A.O. Remizov, Linear Algebra and Geometry, 1 
DOI 10.1007/978-3-642-30994-6_1, © Springer-Verlag Berlin Heidelberg 2013 


2 1 Linear Equations 


the unknowns in system (1.3), and then we shall use the following tableau: 


aii ai2 sts Gin 
a2) a22 sts Qn 

; (1.4) 
Gm1 Gm2 °°" 4Gmn 


with m rows and n columns. Such a rectangular array of numbers is called an m x n 
matrix or a matrix of type (m,n), and the numbers a;; are called the elements of 
the matrix. If m =n, then the matrix is ann x n square matrix. In this case, the 
elements a1], 422, ...,Gnn, each located in a row and column with the same index, 
form the matrix’s main diagonal. 

The matrix (1.4), whose elements are the coefficients of the unknowns of system 
(1.3), is called the matrix associated with the system. Along with the matrix (1.4), it 
is frequently necessary to consider the matrix that includes the constant terms: 


a1 a2 +++ ain db 
a2) 22, +++ Am ~— bg 

(1.5) 
Amt Am2 +7") Amn bm 


This matrix has one column more than matrix (1.4), and thus it is an m x (n+ 1) 
matrix. Matrix (1.5) is called the augmented matrix of the system (1.3). 

Let us consider in greater detail the left-hand side of equation (1.1). Here we 
are usually talking about trying to find specific values of the unknowns x1,..., Xn 
that satisfy the relationship (1.1). But it is also possible to consider the expression 
a,xX| + 2x2 +--++ a,xX, from another point of view. We can substitute arbitrary 
numbers 


x) =C1, x2 =€2, eee Xn =Cn, (1.6) 

for the unknowns x1, x2,...,X, in the expression, each time obtaining as a result a 
certain number 

ajc, +a2c2 +--+ + ann. (1.7) 


From this point of view, we are dealing with a certain type of function. In the given 
situation, the initial element to which we are associating something is the set of 


values (1.6), which is determined simply by the set of numbers (c), C2, ..., Cn). We 
shall call such a set of numbers a row of length n. It is the same as a 1 x n matrix. 
We associate the expression (1.7), which is a number, with the row (cj, c2,..., Cn). 


Then employing the notation of page xiii, we obtain a function on the set M with 
values in N, where M is the set of all rows of length n, and N is the set of all 
numbers. 


Definition 1.1 A function F on the set of all rows of length n with values in the set 
of all numbers is said to be linear if there exist numbers aj, a2,..., d, such that F 
associates to each row (c], €2,..., Cy) the number (1.7). 


1.1 Linear Equations and Functions 3 


We shall proceed to denote a row by a single boldface italic letter, such as c, 
and shall associate with it a number, F(c), via the linear function F. Thus if ¢ = 
(C1, C2,.--, Cn), then F(e) = a,c, + agc2 +--+ + ancy. 

In the case n = 1, a linear function coincides with the well-known concept of 
direct proportionality, which will be familiar to the reader from secondary-school 
mathematics. Thus the notion of linear function is a natural generalization of direct 
proportionality. To emphasize this analogy, we shall define some operations on rows 
of length n in analogy to arithmetic operations on numbers. 


Definition 1.2 Let c and d be rows of a fixed length n, that is, 
C= (C1, C2,.-.-,Cn), d = (d), do,..., dn). 


Their sum is the row (cj + dj, c2 + d2,...,Cn + dn), denoted by c+ d. The product 
of row ¢ and the number p is the row (pcj, pc2,..., PCn), denoted by pe. 


Theorem 1.3 A function F on the set of rows of length n is linear if and only if it 
possesses the following properties: 


F(ce+d) = F(c) + F(a), (1.8) 
F(pe) = pF(e), (1.9) 


for all rows c,d and all numbers p. 


Proof Properties (1.8) and (1.9) are the direct analogue of the well-known condi- 
tions for direct proportionality. 

The proof of properties (1.8) and (1.9) is completely obvious. Let the linear 
function F associate to each row ¢ = (cj, C2,...,¢,) the number (1.7). By the 
above definition, the sum of rows c = (cj,...,C,) and d = (d),..., dy) is the row 
c+d=(cj+d1,...,cC,+d,), and it follows that 


F(e+d) =ay(cy +d) +++ + Gn(Cn + dn) 
= (ajc) + ayd}) +-++ + (Qnen + andn) 
= (aycy + +++ + ncn) + (aid) + +++ + andn) 
= F(c)+ Fd), 


which is equation (1.8). In exactly the same way, we obtain 
F(pe) =a,(pey) + +++ +n (pen) = plaice, + +++ + ancn) = pF (Ce). 


Let us now prove the reverse assertion: any function F on the set of rows of length 
n with numerical values satisfying properties (1.8) and (1.9) is linear. To show this, 
let us consider the row e; in which every entry except the ith is equal to zero, while 
the ith is equal to 1, that is, e; = (0,..., 1,...,0), where the 1 is in the ith place. 


4 1 Linear Equations 


Let us set F'(e;) =a; and let us prove that for an arbitrary row c = (cj,..., Cy), the 
following equality is satisfied: F (ec) = ajc, +---+ ncn. From that we will be able 
to conclude that the function F is linear. 

For this, let us convince ourselves that c = cje; + --- + cye,n. This is almost 
obvious: let us consider what number is located at the ith place in the row cje, + 
+++ + Cyey. In any row ex with k 4 i, there is a 0 in the ith place, and therefore, the 
same is true for c,ex, Which means that in the row c;e;, the element c; is located at 
the ith place. As a result, in the complete sum c)e; + ---+ Cyn, there is c; at the 
ith place. This is true for arbitrary i, which implies that the sum under consideration 
coincides with the row c. 

Now let us consider F'(c). Using properties (1.8) and (1.9) n times, we obtain 


F(c) = F(cye,) + F(cze2 +--+ + cnn) =c1 F(e1) + F(cre2 +--+ + nen) 
=ajcy + F(cpe2 +--+ + Cnen) =a1C, +. a2c2 + F(c3e3 +--+ + Cn€n) 


So SLC] FQC2 + +++ + anCn, 


as asserted. 


We shall soon convince ourselves of the usefulness of these properties of a linear 
function. Let us define the operations on linear functions that we shall be meeting 
in the sequel. 


Definition 1.4 Let F and G be two linear functions on the set of rows of length NV. 
Their sum is the function F + G, on the same set, defined by the equality (F + 
G)(c) = F(c) + G(c) for every row c. The product of the linear function F and the 
number p is the function pF, defined by the relation (pF)(c) = p- F(c). 


Using Theorem 1.3, we obtain that both F + G and pF are linear functions. 
We return now to the system of linear equations (1.3). Clearly, it can be written 
in the form 


F(x) =), 
(1.10) 
Fin (x) =bm, 
where Fj (x),..., Fj,(x) are linear functions defined by the relationships 


Fj (x) = ajix1 +. 4j2x2 + +++ + dinXn. 


A row c is called a solution of the system (1.10) if on substituting x by c, all the 
equations are transformed into identities, that is, F)(¢) = 1, ..., Fin(e) = bm. 

Pay attention to the word “if”! Not every system of equations has a solution. For 
example, the system 


Xy +x2+--++x100 =0, 
xp tx2+---+x100 = 1, 


1.1 Linear Equations and Functions 5 


Fig. 1.1 The intersection of y 
two lines 


of two equations in one hundred unknowns clearly cannot have any solution. 


Definition 1.5 A system possessing at least one solution is said to be consistent, 
while a system with no solutions is called inconsistent. If a system is consistent 
and has only one solution, then it is said to be definite, and if it has more than one 
solution, it is indefinite. 


A definite system is also called uniquely determined, since it has precisely one 
solution. 

Definite systems of equations are encountered frequently, for instance when from 
external considerations it is clear that there is only one solution. For example, sup- 
pose we wish to find the unique point lying on the lines defined by the equations 
x= yand x + y= 1; see Fig. 1.1. It is clear that these lines are not parallel and 
therefore have exactly one point of intersection. This means that the system consist- 
ing of the equations of these two lines is definite. It is easy to find its unique solution 
by a simple calculation. To do so, one may substitute the condition y = x into the 
second equation. This yields 2x = 1, that is, x = 1/2, and since y = x, we have also 
y=1/2. 

The reader has almost certainly encountered indefinite systems in secondary 
school, for example, the system 


fan) (1.11) 
3x —6y =3. 
It is obvious that the second equation is obtained by multiplying the first equation 
by 3. Therefore, the system is satisfied by all x and y that satisfy the first equation. 
From the first equation, we obtain 2y = x — 1, or equivalently, y = (x — 1)/2. We 
can now choose an arbitrary value for x and obtain the corresponding value y = 
(x — 1)/2. Our system thus has infinitely many solutions and is therefore indefinite. 
We have now seen examples of the following types of systems of equations: 


(a) having no solutions (inconsistent), 
(b) having a unique solution (consistent and definite), 
(c) having infinitely many solutions (for example, system (1.11)). 


Let us show that these three cases are the only possibilities. 


6 1 Linear Equations 


Theorem 1.6 [fa system of linear equations is consistent and indefinite, then it has 
infinitely many solutions. 


Proof By the hypothesis of the theorem, we have a system of linear equations that 
is consistent and that contains more than one solution. This means that it has at 
least two distinct solutions: ¢ and d. We shall now construct an infinite number of 
solutions. 

To do so, we consider, for an arbitrary number p, the row r = pe+(1— p)d. We 
shall show first of all that the row r is also a solution. We suppose our system to be 
written in the form (1.10). Then we must show that F;(r) = b; for alli=1,...,m. 
Using properties (1.8) and (1.9), we obtain 


F,(r) = F;(pe + (1 — p)d) = pFi(c) + 1 — p) Fi @) = phi + 1 — pb = bi, 


since c and d are solutions of the system of equations (1.10), that is, Fj(c) = 
F;(d) = b; for alli=1,...,m. 

It remains to verify that for different numbers p we obtain different solutions. 
Then we will have shown that we have infinitely many of them. Let us suppose that 
two different numbers p and p’ yield the same solution pe + (1 — p)d = p’c+ (1— 
p’)d. We observe that we can operate on rows just as on numbers in that we can 
move terms from one side of the equation to the other and remove a common factor 
from the terms inside parentheses. This is justified because we defined operations 
on rows in terms of operations on the numbers that constitute them. As a result, we 
obtain the relation (p — p’)c = (p — p’)d. Since by assumption, p 4 p’, we can 
cancel the factor p — p’. On doing so, we obtain c = d, but by hypothesis, c and d 
were distinct solutions. From this contradiction, we conclude that every choice of p 
yields a distinct solution. 


1.2 Gaussian Elimination 


Our goal now is to demonstrate a method of determining to which of the three types 
mentioned in the previous section a given system of linear equations belongs, that is, 
whether it is consistent, and if so, whether it is definite. If it is consistent and definite, 
then we would like to find its unique solution, and if it is consistent and indefinite, 
then we want to write down its solutions in some useful form. There exists a simple 
method that is effective in each concrete situation. It is called Gaussian elimination, 
or Gauss’s method, and we now present it. We are going to be dealing here with 
proof by induction. That is, beginning with the simplest case, with m = | equations, 
we then move on to the case m = 2, and so on, so that in considering the general 
case of a system of m linear equations, we shall assume that we have proved the 
result for systems with fewer than m equations. 

The method of Gaussian elimination is based on the idea of replacing the given 
system of linear equations with another system having the same solutions. Let us 


1.2. Gaussian Elimination 7 


consider along with system (1.10) another system of linear equations in the same 
number of unknowns: 


Gi(x)= fi, 
(1.12) 
Gi(x) = fi, 
where G(x), ..., G;(x) are some other linear functions in m unknowns. The system 


(1.12) is said to be equivalent to system (1.10) if both systems have exactly the same 
solutions, that is, any solution of system (1.10) is also a solution of system (1.12), 
and vice versa. 

The idea behind Gaussian elimination is to use certain elementary row operations 
on the system that replace a system with an equivalent but simpler system for which 
the answers to the questions about solutions posed above are obvious. 


Definition 1.7 An elementary row operation of type I on system (1.3) or (1.10) 
consists in the transposition of two rows. So that there will be no uncertainty about 
what we mean, let us be precise: under this row operation, all the equations of the 
system other then the ith and the kth are left unchanged, while the ith and kth 
exchange places. 


Thus the number of elementary row operations of type I is equal to the number 
of pairs i,k, i #k, that is, the number of combinations of m things taken 2 at a time. 


Definition 1.8 An elementary row operation of type I consists in the replacement 
of the given system by another in which all equations except the ith remain as be- 
fore, and to the ith equation is added c times the kth equation. As a result, the ith 
equation in system (1.3) takes the form 


(aj) + Cag) x1 + (Giz + CaK2) x2 + +++ + (Gin + CAkn) Xn = Bj + che. (1.13) 


An elementary row operation of type II depends on the choice of the indices i 
and k and the number c, and so there are infinitely many row operations of this type. 


Theorem 1.9 Application of an elementary row operation of type I or II results in 
a system that is equivalent to the original one. 


Proof The assertion is completely obvious in the case of an elementary row oper- 
ation of type I: whatever solutions a system may have cannot depend on the nu- 
meration of its equations (that is, on the ordering of the system (1.3) or (1.10)). We 
could even not number the equations at all, but write each of them, for example, on 
a separate piece of paper. 

In the case of an elementary row operation of type II, the assertion is also fairly 
obvious. Any solution c = (cj, ..., Cn) of the first system after the substitution satis- 
fies all the equations obtained under this elementary row operation except possibly 


8 1 Linear Equations 


the ith, simply because they are identical to the equations of the original system. 
It remains to settle the question for the ith equation. Since ¢ was a solution of the 
original system, we have the following equalities: 


ajc) +.aj2C2 + +++ + 4incn = bj, 
Ag Cy + ag2C2 + +++ + Ann = dx. 


After adding c times the second of these equations to the first, we obtain equality 
(1.13) for x; = cj, ..., X, = Cy. This means that c satisfies the ith equation of the 
new system; that is, c is a solution. 

It remains to prove the reverse assertion, that any solution of the system obtained 
by a row operation of type II is a solution of the original system. To this end, we 
observe that adding —c times the kth equation to equation (1.13) yields the ith 
equation of the original system. That is, the original system is obtained from the 
new system by an elementary row operation of type II using the factor —c. Thus, 
the previous line of argument shows that any solution of the new system obtained by 
an elementary row operation of type II is also a solution of the original system. 


Let us now consider Gauss’s elimination method. As our first operation, let us 
perform on system (1.3) an elementary row operation of type I by transposing the 
first equation and any other in which x; appears with a coefficient different from 0. 
If the first equation possesses this property, then no such transposition is necessary. 
Now, it can happen that x; appears in all the equations with coefficient 0 (that is, x, 
does not appear at all in the equations). In that case, we can change the numbering 
of the unknowns and designate by x; some unknown that appears in some equation 
with nonzero coefficient. After this completely elementary transformation, we will 
have obtained that a;; 4 0. For completeness, we should examine the extreme case 
in which all unknowns appear in all equations with zero coefficients. But in that 
case, the situation is trivial: all the equations take the form 0 = b;. If all the b; are 0, 
then we have the identities 0 = 0, which are satisfied for all values assigned to x;, 
that is, the system is consistent and indeterminate. But if a single b; is not equal to 
zero, then that ith equation is not satisfied for any values of the unknowns, and the 
system is inconsistent. 

Now let us perform a sequence of elementary row operations of type II, adding 
to the second, third, and so on up to the mth equation the first equation multiplied 


respectively by some numbers c2, c3,..., Cm in order to make the coefficient of x1 
in each of these equations equal to zero. It is clear that to do this, we must set 
aQ= —a214))', 3= —a31a;), Aen Cp Ami); which is possible because we 


have ensured by hypothesis that aj; #4 0. As a result, the unknown x; appears in 
none of the equations except the first. We have thereby obtained a system that can 
be written in the following form: 


1.2. Gaussian Elimination 9 


ayyxy+ ee ee eee ee er ar + dajnXn =), 
Ay)X2 eae + 4},Xn = bs, 
Seyret acani vanerae miavaustanarslasi eras (1.14) 


/ - — L/ 
Gy gh2 + + Ginn Xn = Din: 


Since system (1.14) was obtained from the original system (1.3) by elementary row 
operations, it follows from Theorem 1.3 that the two systems are equivalent, that 
is, the solution of an arbitrary system (1.3) has been reduced to the solution of the 
simpler system (1.14). That is precisely the idea behind the method of Gaussian 
elimination. It in fact reduces the problem to the solution of a system of m — 1 
equations: 


Ay, X2 +++: + Gy, Xn = by, 


(1.15) 


/ - a f 
ngX2 t+ + Any Xn = bi, 


Now if system (1.15) is inconsistent, then clearly, the larger system (1.14) is also 
inconsistent. If system (1.15) is consistent and we know the solution, then we can 
obtain all solutions of system (1.14). Namely, if x2 = c2, ..., X, = Cp is any solution 
of system (1.15), then we have only to substitute these values into the first equation 
of the system (1.14). As a result, the first equation of system (1.14) takes the form 


ayiX] +aj2C2 + +++ + ainen = D1, (1.16) 


and we have one linear equation for the remaining unknown x1, which can be solved 
by the well-known formula 


= 
X1 =a,, (D1 — a12€2 — +++ — Atnen), 


which can be accomplished because a1; 4 0. This reasoning is applicable in partic- 
ular to the case m = 1 (if we compare Gauss’s method with the method of proof by 
induction, then this gives us the base case of the induction). 

Thus the method of Gaussian elimination reduces the study of an arbitrary system 
of m equations in n unknowns to that of a system of m — 1 equations in n — 1 
unknowns. We shall illustrate this after proving several general theorems about such 
systems. 


Theorem 1.10 Jf the number of unknowns in a system of equations is greater than 
the number of equations, then the system is either inconsistent or indefinite. 


In other words, by Theorem 1.6, we know that the number of solutions of an 
arbitrary system of linear equations is 0, 1, or infinity. If the number of unknowns 
in a system is greater than the number of equations, then Theorem 1.8 asserts that 
the only possible number of solutions is 0 or infinity. 


10 1 Linear Equations 


Proof of Theorem 1.10 We shall prove the theorem by induction on the number m 
of equations in the system. Let us begin by considering the case m = 1, in which 
case we have a single equation: 


A,X, + 2X2 +--+ + anXn = D1. (1.17) 


We have n > | by hypothesis, and if even one a; is nonzero, then we can number 
the unknowns in such a way that a; #0. We then have the case of equation (1.16). 
We saw that in this case, the system was consistent and indefinite. 

But there remains one case to consider, that in which a; = 0 for alli =1,...,n. 
If in this case bj # 0, then clearly we have an inconsistent “system” (consisting of 
a single inconsistent equation). If, however, b; = 0, then a solution consists of an 
arbitrary sequence of numbers x; = cj, X2 = 2, ..., Xn = Cn, that is, the “system” 
(consisting of the equation 0 = 0) is indefinite. 

Now let us consider the case of m > 1 equations. We employ the method of 
Gaussian elimination. That is, after writing down our system in the form (1.3), we 
transform it into the equivalent system (1.14). The number of unknowns in the sys- 
tem (1.15) is n — 1, and therefore larger than the number of equations m — 1, since 
by the hypothesis of the theorem, n > m. This means that the hypothesis of the 
theorem is satisfied for system (1.15), and by induction, we may conclude that the 
theorem is valid for this system. If system (1.15) is inconsistent, then all the more 
so is the larger system (1.14). If it is indefinite, that is, has more than one solution, 
then in the initial system there will be more than one solution; that is, system (1.3) 
will be indefinite. 


Let us now focus attention on an important special case of Theorem 1.10. A sys- 
tem of linear equations is said to be homogeneous if all the constant terms are equal 
to zero, that is, in (1.3), we have b} =--- = by», = 0. A homogeneous system is al- 
ways consistent: it has the obvious solution xj = --- = x, = 0. Such a solution is 
called a null solution. We obtain the following corollary to Theorem 1.10. 


Corollary 1.11 /f in a homogeneous system, the number of unknowns is greater 
than the number of equations, then the system has a solution that is different from 
the null solution. 


If we denote (as we have been doing) the number of unknowns by n and the 
number of equations by m, then we have considered the case n > m. Theorem 1.10 
asserts that for n > m, a system of linear equations cannot have a unique solution. 
Now we shall move on to consider the case n = m. We have the following rather 
surprising result. 


Theorem 1.12 /f in a system of linear equations, the number of unknowns is equal 
to the number of equations, then the property of having a unique solution depends 
only on the values of the coefficients and not on the values of the constant terms. 


Proof The result is easily obtained by Gaussian elimination. Let the system be writ- 
ten in the form (1.3), with n = m. Let us deal separately with the case that all the co- 


1.2. Gaussian Elimination 11 


efficients aj, are zero (in all equations), in which case the system cannot be uniquely 
determined regardless of the constants b;. Indeed, if even a single b; is not equal to 
zero, then the ith equation gives an inconsistent equation; and if all the b; are zero, 
then every choice of values for the x; gives a solution. That is, the system is indefi- 
nite. 

Let us prove Theorem 1.12 by induction on the number of equations (m =n). We 
have already considered the case in which all the coefficients a;, are equal to zero. 
We may therefore assume that among the coefficients a;,, some are nonzero and 
the system can be written in the equivalent form (1.14). But the solutions to (1.14) 
are completely determined by system (1.15). In system (1.15), again the number of 
equations is equal to the number of unknowns (both equal to m — 1). Therefore, 
reasoning by induction, we may assume that the theorem has been proved for this 
system. However, we have seen that consistency or definiteness of system (1.14) 
was the same as that for system (1.15). In conclusion, it remains to observe that the 
coefficients a; , of system (1.15) are obtained from the coefficients of system (1.3) 
by the formulas 


td a2) / a31 y Am 
An, = A2k — —AIk, a3, = 43k — ——A4\k, sees Ank = 4mk — 41k: 
a1 a1 a 


Thus the question of a unique solution is determined by the coefficients of the orig- 
inal system (1.3). 


Theorem 1.12 can be reformulated as follows: if the number of equations is equal 
to the number of unknowns and the system has a unique solution for certain values 
of the constant terms b;, then it has a unique solution for all possible values of the 
constant terms. In particular, as a choice of these “certain” values we may take all 
the constants to be zero. Then we obtain a system with the same coefficients for the 
unknowns as in system (1.3), but now the system is homogeneous. Such a system is 
called the homogeneous system associated with system (1.3). We see, then, that if 
the number of equations is equal to the number of unknowns, then the system has 
a unique solution if and only if its associated system has a unique solution. Since 
a homogeneous system always has the null solution, its having a unique solution is 
equivalent to the absence of nonnull solutions, and we obtain the following result. 


Corollary 1.13 [fin a system of linear equations, the number of equations is equal 
to the number of unknowns, then it has a unique solution if and only if its associated 
homogeneous system has no solutions other than the null solution. 


This result is unexpected, since from the absence of a solution different from the 
null solution, it derives the existence and uniqueness of the solution to a different 
system (with different constant terms). In functional analysis, this result is called 
the Fredholm alternative.! 


'More precisely, the Fredholm alternative comprises several assertions, one of which is analogous 
to the one established above. 


12 1 Linear Equations 


In order to focus on the theory behind the Gaussian method, we emphasized its 
“inductive” character: it reduces the study of a system of linear equations to an 
analogous system, but with fewer equations and unknowns. It is understood that in 
concrete examples, we must repeat the process, using this latter system and contin- 
uing until the process stops (that is, until it can no longer be applied). Now let us 
make clear for ourselves the form that the resulting system will take. 

When we transform system (1.3) into the equivalent system (1.14), it can happen 
that not all the unknowns x2, ..., x, enter into the corresponding system (1.15), that 
is, some of the unknowns may have zero coefficients in all the equations. Moreover, 
it was not easy to surmise this from the original system (1.3). Let us denote by k 
the first index of the unknown that appears with coefficients different from zero in at 
least one equation of system (1.15). Itis clear that k > 1. We can now apply the same 
operations to this system. As a result, we obtain the following equivalent system: 


ayyxy+ TevC eee rece Ce eee ee eee ee ew ene en eee + a1nXn =b, 
/ 

Ay, Xk icc e eee eee cee cece eee eee eens + a5, Xn = bs, 

ay) xi abe gipiesle aun hte gad haa +a3,Xn =b!, 

” ” me 4 J 

axl Sh sahalatarn pis aug aeecGeldh ey + Gin Xn =D): 


Here we have already chosen / > k such that in the system obtained by removing 
the first two equations, the unknown x; appears with a coefficient different from 
zero in at least one equation. In this case we will have a1; 4 0, Ay #0, ay 4 0, and 
l>k>1. 

We shall repeat this process as long as possible. When shall we be forced to stop? 
We stop after having applied the elementary operations up to the point (let us say 
the rth equation in which x; is the first unknown with nonzero coefficient) at which 
we have reduced to zero all the coefficients of all subsequent unknowns in all the 
remaining equations, that is, from the (s + 1)st to the mth. The system then has the 
following form: 


7 ie + ainXn = b1, 
Gob XE econ ee rah pha ae nape ade uaa Ae + GonxXn = bo, 
ABIX] Herre creer eee eee eens + 43nXp) = bs, 
ee ne ener rear (1.18) 
ArsXs + eRe Braeene + GpnXn = by, 
O=dy+1, 
0=bm. 


Here 1 <k </]<::-<s. 


1.2. Gaussian Elimination 13 


It can happen that r = m, and therefore, there will be no equations of the form 
0 = b; in system (1.18). But ifr < m, then it can happen that br+1 =0,...,bm =9, 
and it can finally be the case that one of the numbers b;+1,..., bm is different from 
zero. 


Definition 1.14 System (1.18) is said to be in (row) echelon form. The same termi- 
nology is applied to the matrix of such a system. 


Theorem 1.15 Every system of linear equations is equivalent to a system in echelon 
form (1.18). 


Proof Since we transformed the initial system into the form (1.18) using a sequence 
of elementary row operations, it follows from Theorem 1.9 that system (1.18) is 
equivalent to the initial system. 


Since any system of the form (1.3) is equivalent to system (1.18) in echelon 
form, questions about consistency and definiteness of systems can be answered by 
studying systems in echelon form. 

Let us begin with the question of consistency. It is clear that if system (1.18) 
contains equations 0 = by with by; + 0), then such a system is inconsistent, since the 
equality 0 = by cannot be satisfied by any values of the unknowns. Let us show that 
if there are no such equations in system (1.18), then the system is consistent. Thus 
we now assume that in system (1.18), the last m — r equations have been converted 
into the identities 0 = 0. 

Let us call the unknowns x1, xz, x;,..., Xs that begin the first, second, third, ..., 
rth equations of system (1.18) principal, and the rest of the unknowns (if there are 
any) we shall call free. Since every equation in system (1.3) begins with its own 
principal unknown, the number of principal unknowns is equal to r. We recall that 
we have assumed br) =-.-=b, =0. 

Let us assign arbitrary values to the free unknowns and substitute them in the 
equations of system (1.18). Since the rth equation contains only one principal un- 
known x,, and that with the coefficient a,,, which is different from zero, we obtain 
for x; one equation in one unknown, which has a unique solution. Substituting this 
solution for x; into the equation above it, we obtain for that equation’s principal 
unknown again one equation in one unknown, which also has a unique solution. 
Continuing in this way, moving from bottom to top in system (1.18), we see that the 
values of the principal unknowns are determined uniquely for an arbitrary assign- 
ment of the free unknowns. We have thus proved the following theorem. 


Theorem 1.16 For a system of linear equations to be consistent, it is necessary and 
sufficient, after it has been brought into echelon form, that there be no equations of 
the form 0 = by with by 4 0. If this condition is satisfied, then it is possible to assign 
arbitrary values to the free unknowns, while the values of the principal unknowns— 
for each given set of values for the free unknowns—are determined uniquely from 
the system. 


14 1 Linear Equations 


Let us now explain when a system will be definite on the assumption that the 
condition of consistency that we have been investigating is satisfied. This question 
is easily answered on the basis of Theorem 1.16. Indeed, if there are free unknowns 
in system (1.18), then the system is certainly not definite, since we may give an arbi- 
trary assignment to each of the free unknowns, and by Theorem 1.16, the assignment 
of principal unknowns is then determined by the system. On the other hand, if there 
are no free unknowns, then all the unknowns are principal. By Theorem 1.16, they 
are uniquely determined by the system, which means that the system is definite. 
Consequently, a necessary and sufficient condition for definiteness is that there be 
no free unknowns in system (1.18). This, in turn, is equivalent to all unknowns in the 
system being principal. But that, clearly, is equivalent to the equality r =n, since r 
is the number of principal unknowns and n is the total number of unknowns. Thus 
we have proved the following assertion. 


Theorem 1.17 For a consistent system (1.3) to be definite, it is necessary and suffi- 
cient that for system (1.18), after it has been brought into echelon form, we have the 
equality r =n. 


Remark 1.18 Any system of n equations in n unknowns (that is, with m = n) 
brought into echelon form can be written in the form 


Q1\1X1 +a@12x2 + 5 Caveas Stearn) anes eye: raneceterer ones + 41nXn = b4, 
a22X2 + avin ari ta alana Gh i tl arias Bp Tae rain: + aAInXn = bo, 

sds bdspieee iden oesdscneeaianae (1.19) 
GnnXn = bp 


(however, not every system of the form (1.19) is in echelon form, since some of the 
aj; can be zero). Indeed, the form (1.19) indicates that in the system, the kth equation 
does not depend on the unknowns x; for i < k, and this condition is automatically 
satisfied for a system in echelon form. 

A system in the form (1.19) is said to be in upper triangular form. The same 
terminology is applied to the matrix of system (1.19). 


From this observation, we can state Theorem 1.15 in a different form for the 
case m =n. The condition r =n means that all the unknowns x), x2,...,X, are 
principal, and that means that in system (1.19), the coefficients satisfy a}; #£0,..., 
Ann # 0. This proves the following corollary. 


Corollary 1.19 System (1.3) in the case m =n is consistent and determinate if and 
only if after being brought into echelon form, we obtain the upper triangular system 
(1.19) with coefficients a,,; #0, a2. £0, ..., Gnn #0. 


We see that this condition is independent of the constant terms, and we thereby 
obtain another proof of Theorem 1.12 (though it is based on the same idea of the 
method of Gaussian elimination). 


1.3 Examples* 15 


Fig. 1.2. Graph of a 
polynomial passing through a 
given set of points 


1.3 Examples* 


We shall now give some examples of applications of the Gaussian method and with 
its aid obtain some new results for the investigation of concrete problems. 


Example 1.20 The expression 
f Sag + ayx tagx? +--+ + anx", 


where the a; are certain numbers, is called a polynomial in the unknown x. If 
dyn #0, then the number n is called the degree of the polynomial f. If we re- 
place the unknown x by some numerical value x = c, we obtain the number 
do + ajc + anc? + --- +. anc", which is called the value of the polynomial at x = c; 
it is denoted by f(c). 

The following type of problem is frequently encountered: We are given two col- 
lections of numbers cj,...,c, and ky,...,k, such that c,,...,c, are distinct. Is it 
possible to find a polynomial f such that f(c;) = k; for i = 1,...,r? The pro- 
cess of constructing such a polynomial is called interpolation. This type of problem 
is encountered when values of a certain variable are measured experimentally (for 
example, temperature) at different moments of time cj,...,c,. If such an interpo- 
lation is possible, then the polynomial thus obtained provides a single formula for 
temperature that coincides with the experimentally measured values. 

We can provide a more graphic depiction of the problem of interpolation by 
stating that we are seeking a polynomial f(x) of degree n such that the graph of 
the function y = f(x) passes through the given points (c;, k;) in the Cartesian plane 
fori=1,...,r (see Fig. 1.2). 

Let us write down the conditions of the problem explicitly: 


ay+ayc) +---+anch =k, 
ag + ajco + +++ + ancy =k, (1.20) 


ag + aycy +++++ anc? =k;,. 


For the desired polynomial f we obtain relationship (1.20), which is a system of lin- 
ear equations. The numbers ao, ..., @, are the unknowns. The number of unknowns 


16 1 Linear Equations 


is n + 1 (the numeration begins here not with the usual a, but with aj). The num- 
bers 1 and ee are the coefficients of the unknowns, and k),...,k, are the constant 
terms. 

If r =n +1, then we are in the situation of Theorem 1.12 and its corollary. 
Therefore, for r =n + 1, the interpolation problem has a solution, and a unique one, 
if and only if the associated system (1.20) has only the null solution. This associated 
system can be written in the form 


f(ci) =0, 
Fler) = 90, (1.21) 
ff (cr) =0. 


A number c for which f(c) = 0 is called a root of the polynomial f. A simple 
theorem of algebra (a corollary of what is known as Bézout’s theorem) states that 
a polynomial cannot have more distinct roots than its degree (except in the case 
that all the a; are equal to zero, in which case the degree is undefined). This means 
(if the numbers c; are distinct, which is a natural assumption) that for r =n + 1, 
equations (1.21) can be satisfied only if all the a; are zero. We obtain that under these 
conditions, system (1.20) (that is, the interpolation problem) has a solution, and the 
solution is unique. We note that it is not particularly difficult to obtain an explicit 
formula for the coefficients of the polynomial f. This will be done in Sects. 2.4 
and 2.5. 


The following example is somewhat more difficult. 


Example 1.21 Many questions in physics (such as the distribution of heat in a solid 
body if a known temperature is maintained on its surface, or the distribution of elec- 
tric charge on a body if a known charge distribution is maintained on its surface, and 
so on) lead to a single differential equation, called the Laplace equation. It is a partial 
differential equation, which we do not need to describe here. It suffices to mention 
one consequence, called the mean value property, according to which the value of 
the unknown quantity (satisfying the Laplace equation) is equal at every point to 
the arithmetic mean of its values at “nearby” points. We need not make precise here 
just what we mean by “nearby points” (suffice it to say that there are infinitely many 
of them, and this property is defined in terms of the integral). We will, however, 
present a method for an approximate solution of the Laplace equation. Solely for 
the purpose of simplifying the presentation, we shall consider the two-dimensional 
case instead of the three-dimensional situation described above. That is, instead of 
a three-dimensional body and its surface, we shall examine a two-dimensional fig- 
ure and its boundary; see Fig. 1.3(a). To construct an approximate solution in the 
plane, we form a lattice of identical small squares (the smaller the squares, the bet- 
ter the approximation), and the contour of the figure will be replaced by the closest 
approximation to it consisting of sides of the small squares; see Fig. 1.3(b). 


1.3 Examples* 17 


Fig. 1.3. Constructing an 
approximate solution to the 
Laplace equation 


(a) (b) 


Fig. 1.4 The “nearby 

vertices” to a are the points A 

b,c, d,e oO 
@ 


We examine the values of the unknown quantity (temperature, charge, etc.) only 
at the vertices of the small squares. Now the concept of “nearby points” acquires 
an unambiguous meaning: each vertex of a square of the lattice has exactly four 
nearby points, namely the “nearby” vertices. For example, in Fig. 1.4, the point a 
has nearby vertices b, c, d, e. 

We consider as given some quantities x, for all the vertices a of the squares inter- 
secting the boundary (the thick straight lines in Fig. 1.3(b)), and we seek such values 
for the vertices of the squares located inside this contour. Now an approximate ana- 
logue of the mean value property for the point a of Fig. 1.4 is the relationship 

_ Xp Xe +Xd + Xe 


Xq= i (1.22) 


There are thus as many unknowns as there are vertices inside the contour, and to 
each such vertex there corresponds an equation of type (1.22). This means that we 
have a system of linear equations in which the number of equations is equal to the 
number of unknowns. If one of the vertices b, c, d, e is located on the contour, then 
the corresponding quantity, one of xp, X¢,Xd,Xe, must be assigned, and equation 
(1.22) in this case is inhomogeneous. An assertion from the theory of linear equa- 
tions that we shall prove is that regardless of how we assign values on the boundary 
of the figure, the associated system of linear equations always has a unique solution. 

We clearly find ourselves in the situation of Corollary 1.13, and so it suffices to 
verify that the homogeneous system associated with ours has only the null solution. 
The associated homogeneous system corresponds to the case in which all the values 
on the boundary of the figure are equal to zero. Let us suppose that it has a solution 
X1,...,XN (where N is the number of equations) that is not the null solution. If 
among the numbers x; there is at least one that is positive, then let us denote by x, 
the largest such number. Then equation (1.22) (in which any of xp, X¢, xg, Xe Will 


18 1 Linear Equations 


Fig. 1.5 Simple contour for 
an approximate solution of 
the Laplace equation 


@2 
o> 
eon 
Oa. 
On 


Fig. 1.6 Electrical network 


equal zero if the associated point b, c, d, e lies on the contour) can be satisfied only 
if Xp =Xe = Xd =Xe = Xq, Since the arithmetic mean does not exceed the maximum 
of the numbers. 

We can reason analogously for the point b, and we find that the value of each 
nearby point is equal to x,. By continuing to move to the right, we shall eventually 
reach a point p on the contour, for which we obtain x, = xq > 0. But that contradicts 
the assumption that the value of x, for the point p on the contour is equal to zero. 
For example, for the simple contour of Fig. 1.5, we obtain the equalities x, = xg, 
Xe = Xp = Xa, Xd = Xa, Xe = Xa, Xp = Xq, the last of which is impossible, since 
Xq > 0, Xp = 0. If all the numbers x; in our solution are nonpositive but not all 
equal to zero, then we can repeat the above argument with x, taken as the smallest 
of them (the largest of the numbers in absolute value). 

The above arguments can be applied to proving the existence of a solution to the 
Laplace equation (by passage to the limit). 


Example 1.22 This example concerns electrical networks. Such a network (see 
Fig. 1.6) consists of conductors, each of which we shall consider to be uniform, 
connected together at points called nodes. At one point in the network, a direct cur- 


Such a proof was given by Lyusternik, and both the proof and the argument we have given here 
are taken from I.G. Petrovsky’s book Lectures on Partial Differential Equations, Dover Books on 
Mathematics, 1992. 


1.3 Examples* 19 


Fig. 1.7 Decomposable 
network 


rent 7 enters, while at another point, current j exits. A uniform current flows due to 
the homogeneity of each conductor. 

We shall designate the conductors by the Greek letters a, 6, y,..., and the 
strength of the current in conductor @ by ig. Knowing the current i, we would like 
to find the currents iy,ig,i,,... for all the conductors in the network a, B, y,..., 
and the current j. We shall denote the nodes of the network by a,b, c,.... 

We need to make one additional refinement here. Since the current in a conductor 
flows in a particular direction, it makes sense to indicate the direction with a sign. 
This choice is arbitrary for each conductor, and we designate the direction by an 
arrow. The nodes joined by a conductor are called its beginning and end, and the 
arrow points from the beginning of the conductor to the end. The beginning of the 
conductor a will be denoted by a’, and the end will be denoted by a”. The current 
ig Will be considered positive if it flows in the direction of the arrow, and will be 
considered negative otherwise. We shall say that the current ig flows out of node 
a (flows into node a) if there is a conductor w with beginning (end) node a. For 
example, in Fig. 1.6, the current ig flows out of a and flows into b; thus according 
to our notation, a’ =a anda” = b. 

We shall assume further that the network in question satisfies the following nat- 
ural condition: Two arbitrary nodes a and b can be connected by some set of nodes 
C1,--+;C, in such a way that each of the pairs a,c}; C1, C23 ---3 Cn—1,€n3 Cn, D are 
connected by a conductor. We shall call this property of the network connectedness. 
A network not satisfying this condition can be decomposed into a number of subnet- 
works each of whose nodes are not connected to any nodes of any other subnetwork 
(Fig. 1.7). We may then consider each subnetwork individually. 

A collection of nodes a1, ..., a, connecting conductors a1, ..., @, such that con- 
ductor a; connects node a; and az, conductor a2 connects nodes az and a3, ..., 
conductor @,_1 connects nodes a,—; and a,, and conductor a, connects nodes ay, 
and a, is called a closed circuit. For example, in Fig. 1.6, it is possible to select as 
a closed circuit nodes a, b, c,d, h and conductors a, B, y, €, n, or else, for example, 
nodes e, g,h,d and conductors yw, 7, €, 46. The distribution of current in the closed 
circuit is determined by two well-known laws of physics: Kirchhoff’s laws. 

Kirchhoff’s first law applies to each node of a network and asserts that the sum 
of the currents flowing into a node is equal to the sum of the currents flowing out it. 
More precisely, the sum of the currents in the conductors that have node a at their 


20 1 Linear Equations 


end is equal to the sum of the currents in the conductors for which node a is the 
beginning. This can be expressed by the following formula: 


Y\ia— Dip =0 (1.23) 
a/=a p"=a 

for every node a. For example, in Fig. 1.6, for the node e we obtain the equation 
ig —i3 —i, —i, =O. 


Kirchhoff’s second law applies to an arbitrary closed circuit consisting of con- 
ductors in a network. Namely, if the conductors a; form a circuit C, then with a 
direction of such a circuit having been assigned, the law is expressed by the equa- 
tion 


SS Spain =O, (1.24) 


ajyeC 


where Pq, is the resistance of the conductor a (which is always a positive num- 
ber, since the conductors are homogeneous), and where the plus sign is taken if the 
selected direction of the conductor (indicated by an arrow) coincides with the direc- 
tion of the current in the circuit, and the minus sign is taken if it is opposite to the 
direction of the current. For example, for the closed circuit C with nodes e, g,h,d 
as shown in Fig. 1.6 and with the indicated direction of the circuit, Kirchhoff’s law 
gives the equation 


—Ppuly + poig — pete + psis = 0. (1.25) 


We thereby obtain a system of linear equations in which the unknowns are 
ig, ig,iy,...and j. Such a system of equations is encountered in a number of prob- 
lems, such as the allocation of loads in a transport network and the distribution of 
water is a system of conduits. 

Our goal is now to show that the system of equations thus obtained (for the given 
network and currents 7) has a unique solution. 

First, we observe that the outflowing current j is equal to 7. This is obvious from 
physical considerations, but we must derive it from the equations of Kirchhoff’s 
law. To this end, let us collect all equations (1.23) for Kirchhoff’s first law for all 
nodes a of our network. How often do we encounter conductor @ in the obtained 
equation? We encounter it once when we examine the equation corresponding to the 
node a = a’, and another time for a = a”. Furthermore, the current i, enters into 
the two equations with opposite signs, which means that they cancel. All that will 
remain in the resulting equation is the current i (for the point into which the current 
flows) and —j (for the point where the current flows out). This yields the equation 
i— j =0, that is, i = j. 

Now let us note that not all the equations (1.24) corresponding to Kirchhoff’s 
second law are independent. We shall call a closed circuit @1,...,@, a cell if ev- 
ery pair of its nodes is connected only by a conductor from among a, ...,@, and 
by no others. Every closed circuit can be decomposed into a number of cells. For 


1.3 Examples* 21 


Fig. 1.8 Circuits for the 
proof of Euler’s theorem 


(a) (b) 


example, in Fig. 1.6, the circuit C with nodes e, g,h,d and conductors py, 3, &,6 
can be decomposed into two cells: one with nodes e, g, and conductors j, 3, A, 
and the other with nodes e, h, d and conductors 4, &, 6. In this case, equation (1.24) 
corresponding to the circuit is the sum of the equations corresponding to the individ- 
ual cells (with a proper choice of directions for the circuits). For example, equation 
(1.25) for the circuit C with nodes e, g,h, d is the sum of equations 


—Ppip + poto + pai, =0, — prin — pele + pais = 0, 


corresponding to the cells with nodes e, g,h and e,h,d. 

Thus, we can restrict our attention to equations of the cells of the network. Let us 
prove, then, that in the entire system of equations (1.23) and (1.24) corresponding 
to Kirchhoff’s first and second laws, the number of equations will be equal to the 
number of unknowns. We shall denote by Nee, Neond, and Npode the numbers of 
cells, conductors, and nodes of the network. The number of unknowns i, and j is 
equal to Ncond + 1. Each cell and each node contributes one equation. This means 
that the number of equations is equal to Ney + Nnode, and we need to prove the 
equality 


Neett + Nnode = Neond + 1. (1.26) 


This is a familiar equality. It comes from topology and is known as Euler’s theorem. 
It is very easy to prove, as we shall now demonstrate. 

Let us make the important observation that our network is located in the plane: 
the conductors do not have to be straight line segments, but they are required to 
be nonintersecting curves in the plane. We shall use induction on the number of 
cells. Let us delete the “outer” side of one of the “external” cells (for example, side 
(b,c, d) in Fig. 1.8(a)). In this case, the number of cells Noe is reduced by 1. 

If in the “deleted” side there were k conductors, then the number Neona will de- 
crease by k, while the number Nnode will decrease by k — 1. Altogether, the number 
Neett — Ncond + Nnode — 1 does not change. In this process, the property of con- 
nectedness is not destroyed. Indeed, any two nodes of the initial network can be 
connected by the sequence of nodes c1,..., cn. If even part of this sequence con- 
sisted of vertices of the “deleted” sides of our cell, then we could replace them with 
the sequence of nodes of its “nondeleted” sides. 


22 1 Linear Equations 


Fig. 1.9 Closed circuit za 
containing nodes x and t t 


This process reduces the proof to the case Nee = 0, that is, to a network that 
does not contain a closed circuit. We now must prove that for such a network, 
Nnode — Neond = 1. We now use induction on the number Neong. Let us remove 
any “external” conductor at least one end of which is not the end of another con- 
ductor (for example, the conductor @ in Fig. 1.8(b)). Then both numbers Neong and 
Nnode are reduced by 1, and the number Neond — Nnode remains unchanged. We may 
easily convince ourselves that in this case, the property of connectedness is again 
preserved. As a result, we arrive at the case Neong = 0 but Nnode > 0. Since the net- 
work must be connected, we have Nnode = I, and it is clear that we have the equality 
Nnode — Neond = 1. 

We now note an important property of networks satisfying relationship (1.24) 
that emerges from Kirchhoff’s second law (for given currents ig). With each node a 
one can associate a number r, such that for an arbitrary conductor a beginning at a 
and ending at b, the following equation is satisfied: 


Pala =a —Tb- (1.27) 


To determine these numbers rg, we shall choose some node x and assign to it the 
number r, arbitrarily. Then for each node y connected to x by some conductor a, 
we set ry =; — Paiq if x is at the beginning of a and y at the end, and ry = 
ry + Paiq in the opposite case. Then in exactly the same way, we determine the 
number r, for each node connected by a conductor to one of the examined nodes 
x,y, etc. In view of the connectedness condition, we will eventually reach every 
node ¢ of our network, to which we will have assigned, say, the number 7;. But it 
is still necessary to show that this number 7; is independent of the path by which 
we atrive from x to ¢ (that is, which point we chose as y, then as z, and so on). To 
accomplish this, it suffices to note that a pair of distinct paths linking nodes x and 
t forms a closed circuit (Fig. 1.9), and the relationship that we require follows from 
Kirchhoff’s second law (equations (1.24)). 

It is now easy to show that the system of linear equations (1.23) obtained from 
Kirchhoff’s first law for all nodes and from Kirchhoff’s second law (1.24) for all 
cells has a unique solution. To do so, it suffices, as we know, to show that the asso- 
ciated homogeneous system has only the null solution. This homogeneous system 
is obtained for i = j = 0. 

Of course, “physically,” it is completely obvious that if we put no current into the 
network, then there will be no current in its conductors, but we must prove that this 
follows in particular from Kirchhoff’s laws. 


1.3. Examples* 23 


To this end, consider the sum Yo Pal - where the sum is over all conductors 
of our network. Let us break the term Paid into two factors: Dai = (Pala) + ig: 
We replace the first factor by rg — rp on the basis of relation (1.27), where a is 
the beginning and b the end of conductor a. We obtain the sum Mo (ra — rp)ias 
and we collect the terms in which the first factor rg or —rp is associated with a 
particular node c. Then we can pull the number 7, outside the parentheses, and 
inside will remain the sum )> ia — )~ B’'=c 'p» Which is equal to zero on account 


of Kirchhoff’s first law (1.23). We finally obtain that >, paiZ = 0, and since the 
resistance Py is positive, all the currents i, must be equal to zero. 

To conclude, we remark that networks appearing in mathematics are called 
graphs, and “conductors” become the edges of the graph. In the case that every 
edge of a graph is assigned a direction (provided with arrows, for example), the 
graph is then said to be directed. This theorem holds not for arbitrary graphs, but 
only for those, like the networks that we have considered in this example, that can 
be drawn in the plane without intersections of edges (for which we omit a precise 
definition). Such graphs are called planar. 


Chapter 2 
Matrices and Determinants 


2.1 Determinants of Orders 2 and 3 


We begin by considering a system of two equations in two unknowns: 


ayix] +412x2 = by, 
a2|X1 + 422X2 = bp. 


In order to determine x;, we attempt to eliminate x2 from the system. To accomplish 
this, it suffices to multiply the first equation by a22 and add to it the second equation 
multiplied by —aj2. We obtain 


(411422 — 421412)x1 = bya22 — b2a12. 
We consider the case in which a11a22 — a2\a12 4 0. Then we obtain 


by az2 — b2a42 
ee (2.1) 
1422 — a21a12 


Analogously, to find the value x2, we multiply the second equation by aj; and add 
to it the first multiplied by —a2,. With the same assumption (a11a22 — a21a12 4 0), 


we obtain 
bya, — diay 
Fo ee eee | ad (2.2) 
a\1a22 — a21a\2 


The expression a11d22 — 4,22; appearing in the denominator of formulas (2.1) 


and (2.2) is called the determinant of the matrix es a) (it is called a determinant 
of order 2, or a 2 x 2 determinant) and is denoted by lems oe |. Therefore, we have 
by definition, 
ai, ay2 
= 411422 — a21412. (2.3) 
a2; a22 
LR. Shafarevich, A.O. Remizov, Linear Algebra and Geometry, 25 


DOI 10.1007/978-3-642-30994-6_2, © Springer-Verlag Berlin Heidelberg 2013 


26 2 Matrices and Determinants 


Fig. 2.1 Calculating (a) the C 
area of a triangle and (b) the 
volume of a tetrahedron 
B 
x O 
> 
A 
(b) 


We see that in the numerators of formulas (2.1) and (2.2) there also appears an 
expression of the form (2.3). Using the notation we have introduced, we can rewrite 
these formulas in the following form: 


|? a2 ay by 

bz a22 az b2 

i= (a a2 |? 2— a a2)" (2.4) 
a2 a22 a2 472 


The expression (2.3) is useful for more than a symmetric way of writing solutions 
of two equations in two unknowns. It is encountered in a great number of situations, 
and therefore has a special name and notation. For example, consider two points A 
and B in the plane with respective coordinates (x1, y,) and (x2, y2); see Fig. 2.1(a). 
It is not difficult to see that the area of triangle O AB is equal to (x1 y2 — y1 x2) /2. For 
example, we could subtract from the area of triangle O B D the area of the rectangle 
ACDE and the areas of triangles ABC and OAE. We thereby obtain 


X1 Yi 
X22 


AOAB= 


1 
2 


Having in hand formulas for solutions of systems of two equations in two un- 
knowns, we can solve some other systems. Consider, for example, the following 
homogeneous system of linear equations in three unknowns: 


aX, + ay2Xx2 +.a13x3 = 0, 
1%1 12%2 13X3 (2.5) 
a1 x1 + a22X2 + a23x3 = 0. 


We are interested in nonnull solutions of this system, that is, solutions in which at 
least one x; is not equal to zero. Suppose, for example, that x3 4 0. Dividing both 
sides by —x3 and setting —x1/x3 = y1, —x2/x3 = y2, we can write system (2.5) in 
the form 


a11y1 + 4122 = 4)3, 
a21y1 + 422 y2 = 473, 


2.1 Determinants of Orders 2 and 3 27 


which is in a form we have considered. If | iSomeie | # 0, then formula (2.4) gives the 


az a22 
expressions 
lies a2 lee ol 
es x1 — | 423 422 —_ X2 _ | dat a3 
YI= ~~ = yan ai)? 2 =~ = Tan a2] ° 
x3 | x3 | 
a2 a2 az a22 


Unsurprisingly, we determined from system (2.5) not x1,x2,x3, but only their 
mutual relationships: from such a homogeneous system, it easily follows that if 
(C1, C2, C3) is a solution and p is an arbitrary number, then (pc), pc2, pc3) is also a 
solution. Therefore, we can set 


a1 412 
421 422 


a1 443 
421 423 


a a 
n=—|3 2) ya = 2.6) 
423-422 


’ 


and say that an arbitrary solution is obtained from this one by multiplying all the x; 
by p. In order to give our solution a somewhat more symmetric form, we observe 
that we always have 


c d| ld el’ 


a l=-|a a 


This is easily checked with the help of formula (2.3). Therefore, (2.6) can be written 
in the form 


a1 412 
a21 422 


Q2 413 
422-423 


a1 4)3 
a21 423 


i= = > X3 = . (2.7) 


’ 


Formulas (2.7) give values for x1, x2, x3 if we cross out in turn the first, second, and 
third columns and then take the obtained second-order determinants with alternating 
signs. We recall that these formulas were obtained on the assumption that 


a1, 412 
421 422 


#0 


It is easy to check that the assertion we have proved is valid if at least one of the three 
determinants appearing in (2.7) is not equal to zero. If all three determinants are 
zero, then, of course, formula (2.7) again gives a solution, namely the null solution, 
but now we can no longer assert that all solutions are obtained by multiplying by a 
number (indeed, this is not true). 

Let us now consider the case of a system of three equations in three unknowns: 


A,X} + 412X2 + 43x3 = dy, 
a21X1 + a22x2 + a23x3 = bo, 
a3, X1 + 432X2 + a33x3 = 53. 


We again would like to eliminate x2 and x3 from the system in order to obtain a 
value for x;. To this end, we multiply the first equation by cj), the second by c2, 


28 2 Matrices and Determinants 


and the third by c3 and add them. We shall therefore choose c,, c2, and c3 such that 
in the system obtained, the terms with x2 and x3 become equal to zero. Setting the 
associated coefficients to zero, we obtain for c1, c2, and c3 the following system of 


equations: 


ay2Cc1 + a22¢2 + a32¢3 = 0,7 
a43C, + a23¢C2 + a3303 = 0. 


This system is of the same type as (2.5). Therefore, we can use the formula (2.6) 
that we derived and take 


_ {422 432 _  |a12—— 32 _ {412 a22 
C= ; Q=- ; 3 : 
a23. 433 a3 «3 a3 «a3 
As a result, we obtain for x; the equation 
a22 «32 ai2 432 a2 43 
a1 = + a3 x) 
423-433 413 433 422 «423 
a22. 23 a2 «432 a2 43 
=b; = + b3 : (2.8) 
432 433 413 433 422-423 
The coefficient of x; in (2.8) is called the determinant of the matrix 
a1 412 43 
a21 422 23 
431 432 433 
and is denoted by 
a1 a\2 443 
a21 a22 a3). 
431 32 433 
Therefore, by definition, 
ai, aj2 443 
a22 423 a2. 443 a2 443 
a2, 422, 423) = a1 — a) + 431 . (2.9) 
432 «433 432 433 a22 


431 432 433 


It is clear that the right-hand side of equation (2.8) is obtained from the coefficient 
of x; by substituting a;; for b;, i = 1, 2,3. Therefore, equality (2.8) can be written 
in the form 

a1 a2 443 by ay 443 

a21 422 a73}x, =|b2 a22_— a3). 

431 32 433 b3 432 433 


2.1 Determinants of Orders 2 and 3 29 


We shall assume that the coefficient of x,, that is, the determinant (2.9), is different 
from zero. Then we have 


by a2 a13 
bz az2 a23 
b3 432 433 
a1 412 413 
a21 422 423 
431 432 433 


: (2.10) 


Xj, = 


We can easily carry out the same calculations for x2 and x3. We obtain then the 
formulas 


ay by a3 ay 12 by 
a2 bp a3 a2 ay by 
Poe b3 a33 x b3 
2 Tan a2 a3 |’ 3 Tay a2 a3 |" 
a2 422 423 a2) 422 423 | 
431 432 433 a3) 432 433 


Just as second-order determinants express area, third-order determinants enter 
into a number of formulas for volume. For example, the volume of a tetrahedron 
with vertices at the points O (the coordinate origin) and A, B, C with coordinates 
(x1, Y1,Z1), (x2, Y2, 22), (X3, Y3, 23) (See Fig. 2.1(b)), is equal to 


1 X1 Yl Zi 
rs X22 22]. 
X30 V3 23 


This shows that the notion of determinant that we have introduced is encountered 
in a number of branches of mathematics. We now return to the problem of solving 
systems of n linear equations in n unknowns. 

It is clear that we can apply the same line of reasoning to a system consisting of 
four equations in four unknowns. To do so, we need to derive formulas analogous to 
(2.7) for the solution of a homogeneous system of three equations in four unknowns 
based on formula (2.9). Then to eliminate x2, x3, x4 in a system of four equations in 
four unknowns, we multiply the equations by the coefficients c), c2, c3, cq and add. 
The coefficients c;, c2, c3, cq Will satisfy a homogeneous system of three equations, 
which we are able to solve. This will give us uniquely solvable linear equations in 
the unknowns x1, ..., x4 (as in the previous cases with two and three variables, the 
idea is the same for any number of unknowns). We call the coefficient of the un- 
knowns a fourth-order determinant. Solving the linear equations thus obtained, we 
arrive at formulas expressing the values of the unknowns x1,...,x4, analogous to 
formula (2.10). Thus it is possible to obtain solutions to systems with an arbitrarily 
large number of equations and with the same number of unknowns. 

To derive a formula for the solution of m equations in n unknowns, we have to 
introduce the notion of the determinant of the n x n square matrix 


411 412 *** Gln 


(2.11) 


GQnl Gn2 *** Ann 


30 2 Matrices and Determinants 


that is, a determinant of order n. 

Our previous analysis suggests that we define the n x n determinant by induction: 
For n = 1, we consider the determinant of the matrix (a1) to be equal to the number 
a 1, and assuming that the determinant of order n — | has been defined, we proceed 
to define the determinant of order n. 

Formulas (2.3) and (2.9) suggest how this should be done. In both formulas, 
the determinant of order n (that is, two or three) was expressed in the form of an 
algebraic sum of elements of the first column of matrix (2.11) (that is, of elements 
a11,421,---,4n1) multiplied by determinants of order n — 1. The determinant of 
order n — | by which a given element of the first column was multiplied was obtained 
by deleting from the original matrix the first column and the row in which the given 
element was located. Then the n products were added with alternating signs. 

We shall give a general definition of an m x n determinant in the following sec- 
tion. The sole purpose of the discussion above was to make such a definition intel- 
ligible. The formulas introduced in this section will not be used again in this book. 
Indeed, they will be corollaries of formulas that we shall derive for determinants of 
arbitrary order. 


2.2 Determinants of Arbitrary Order 


A determinant of the square n x n matrix 


411 @j2 ++: Gin 

421 €@22 ++: 4a2n 
A= 

GQnl Gn2 *** Ann 


is a number associated with the given matrix. It is defined inductively on the num- 
ber n. For n = 1, the determinant of the matrix (a;;) is simply the number a1. 
Suppose that we know how to compute the determinant of an arbitrary matrix of 
order (n — 1). We then define the determinant of a square matrix A as the product 


|A| = a11 Dy — aa D2 +431 D3 — a4, Dg + +++ + (-1)"* lant Dn, (2.12) 


where Dy is the determinant of order (x — 1) obtained from the matrix A by deleting 
the first column and the kth row. (The reader should verify that for n = 2 and n = 3 
we obtain the same formulas for determinants of order 2 and 3 presented in the 
previous section.) 

Let us now introduce some useful notation and terminology. The determinant of 
the matrix A is denoted by 


411 412 *** Gln 
421 422 *** 42n 


Gnl Gn2 *** Ann 


2.2 Determinants of Arbitrary Order 31 


or simply by |A|, for short. If we delete the ith row and the jth column of the 
matrix A and preserve the ordering of the remaining elements, then we end up with 
a matrix of order (n — 1). Its determinant is denoted by Mj; and is called a minor 
of the matrix A, or more precisely, the minor associated with the element a;;. With 
this notation, (2.12) can be written in the form 


| A] = a1) My — ap) Mo) + 43) M3y — +» + (1) an My. (2.13) 


This formula can be expressed in words thus: The determinant of ann x n matrix is 
equal to the sum of the elements of the first column each multiplied by its associated 
minor, where the sum is taken with alternating signs, beginning with plus. 


Example 2.1 Suppose a particular square matrix A of order n has the property that 
all of its elements in the first column are equal to zero except for the element in the 
first row. That is, 


Qi1 412 ++: Gin 
O an2 +++ An 

A= 
O an2 ++) Ann 


Then in (2.13), all the terms except the first are equal to zero. Then formula (2.13) 
gives the equality 


|A| =ay,|A’ 


, (2.14) 
where the matrix 


a22 ++: an 


is of order n — 1. 
There is a useful generalization of (2.14) that we shall now prove. 
Theorem 2.2 We have the following formula for the determinant of a square matrix 


A of order n +m for which every element in the intersection of the first n columns 
and last m rows is zero: 


Qi1oo*t* Gin G@in+l ***  Gln+m 
[Al - GQni *** G@nn GAnn+1 ***  Gnnt+m 


O -. 0 bites Dt 


0 i 0 bmi ARG bm 


32 2 Matrices and Determinants 


ai +t: Gin| |b +++ Dim 
Sie Fe (Sy) Be ee ils (2.15) 
Gni ts Ann| |Bmi +++ Bm 
Proof We again make use of the definition of a determinant, namely formula (2.13), 


now of order n + m, and we again employ induction on n. In our case, the last m 
terms of (2.13) are equal to zero, and so we obtain 


[A| = a11M 1 — a21M21 +.431M31 — +--+ (-1)"* ant Mn. (2.16) 


It is now clear that M;, is a determinant of the same type as A, but of order n — 
1 +m. Therefore, by the induction hypothesis, we can apply the theorem to this 
determinant, obtaining 


by tts Dim 
IMial=Ma-|: +. : |, (2.17) 
bmi ++ Dinm 
where M;; has the same meaning as in (2.13) for the determinant | A]. Substituting 


expressions (2.17) into (2.16) and using (2.13) for |A|, we obtain relation (2.15). 
The theorem is proved. 


Remark 2.3 One may well ask why in our definition the first column played a spe- 
cial role and what sort of expressions we might obtain were we to formulate the 
definition in terms not of the first column, but of the second, third, ..., column. As 
we shall see, the expression obtained will differ from the determinant by at most a 
sign. 


Now let us consider some of the basic properties of determinants. Later on, we 
shall see that in the theory of determinants, just as in the theory of systems of linear 
equations, an important role is played by elementary row operations. Let us note 
that elementary operations like those of type I and type II can be applied to the rows 
of a matrix whether or not it is the matrix of a system of equations. Theorem 1.15 
shows that an arbitrary matrix can be transformed into echelon and triangular form. 

Therefore, it will be useful to figure out how elementary operations on the rows of 
a matrix affect the matrix’s determinant. In connection with this, we shall introduce 
some special notation for the rows of a matrix A: We shall denote by a; the ith row 
of A,i=1,...,n. Thus 


a; = (Gj1, 4i2,..-, Gin). 


We shall prove several important properties of determinants. We shall prove Proper- 
ties 2.4, 2.6, and 2.7 below by induction on the order n of the determinant. For n = 1 
(or for Property 2.6, for n = 2), these properties are obvious, and we shall omit a 
proof. We can therefore assume in the proof that the properties have been proved for 
determinants of order n — 1. 


2.2 Determinants of Arbitrary Order 33 


By definition (2.13), a determinant is a function that assigns to the matrix A a 
certain number |A|. We shall now assume that all the rows of the matrix A except 
for one, let us say the ith, are fixed, and we shall explain how the determinant 
depends on the elements of the ith row aj. 


Property 2.4 The determinant of a matrix is a linear function of the elements of an 
arbitrary row of the matrix. 


Proof Let us suppose that we wish to prove this property for the ith row of matrix A. 
We shall use formula (2.13) and show that every term in it is a linear function of the 
elements of the ith row. For this, it suffices to choose numbers dj ;, d2;,...,dyj Such 
that 


ta jj Mj = dy jai) + dojaj2 + +++ + dnjain 


for all j = 1,2,..., (see the definition of linear function on p. 2). We begin with 
the term +a;;M;,. Since the minor M;; does not depend on the elements of the ith 
row—the ith row is ignored in the calculation—it is simply a constant as a function 
of the ith row. Let us set dj; = Mj, and dj; = d3; = --- = dyj = 0. Then the first 
term is represented in the required form, and indeed is a linear function of the ith 
row of the matrix A. For the term taj;|Mj1, for j #i, the element aj; does not 
appear in the ith row, but all the elements of the ith row of matrix A other than a; 
appear in some row of the minor Mj. Therefore, by the induction hypothesis, Mj, 
is a linear function of these elements, that is, 


Mii = dy a2 a dn jin 


for some numbers d) way qj: Setting d2; = ajdy;, veey Ani = ajidy;. and 
d,; =0, we convince ourselves that a;;Mj;1 is a linear function of the ith row of 
matrix A, but this means that such is also the case for the function -ra ;; Mj. There- 
fore, |A| is the sum of linear functions of the elements of the ith row, and it follows 


that |A| is itself a linear function (see p. 4). 


Corollary 2.5 [f we apply Theorem 1.3 to a determinant as a function of its ith 
row,! then we obtain the following: 


1. Multiplication of each of the elements of the ith row of a matrix A by the number 
p multiplies the determinant |A| by the same number. 

2. If all elements of the ith row of matrix A are of the form aj; = bj; + c;, then its 
determinant | A| is equal to the sum of the determinants of two matrices, in each 
of which all the elements other than the elements in the ith row are the same as 
in the original, and in the ith row of the first determinant, instead of the elements 


'We are being a bit sloppy with language here. We have defined the determinant as a function that 
assigns a number to a matrix, so when we speak of the “rows of a determinant,” this is shorthand 
for the rows of the underlying matrix. 


34 2 Matrices and Determinants 


ajj, one has the numbers b;, while in the ith row of the other one, the numbers 
are Cj. 


Property 2.6 The transposition of two rows of a determinant changes its sign. 


Proof We again begin with formula (2.13). Let us assume that we have interchanged 
the positions of rows j and j + 1. We first consider the term aj; Mj1, where i 4 j 
and i ~ j + 1. Then interchanging the jth and (j + 1)st rows does not affect the 
elements a;,;. As for the minor M;1, it contains the elements of both the jth and 
(j + 1)st rows of the original matrix (other than the first element of each row), 
where they again fill two neighboring rows. Therefore, by the induction hypothesis, 
the minor M;; changes sign when the rows are transposed. Thus every term a;; Mj 
with i # j andi # j + 1 changes sign with a transposition of the jth and (j + 1)st 
rows. The remaining terms have the form 


(-1) aj Myr + (-1) Paji Misi 
= (-1)t@j Mj — aj411Mj411)- (2.18) 


With a transposition of the jth and (j + 1)st rows, it is easily seen that the terms 
ajiMj1 and aj+11Mj+11 exchange places, which means that the entire expression 
(2.18) changes sign. This proves Property 2.6. 


In what follows, a prominent role will be played by the square matrices 


O 1 «=. 0 
E=|. . . Ils (2.19) 


all of whose elements on the main diagonal are equal to 1 and all of whose nondi- 
agonal elements are equal to zero. Such a matrix E is called an identity matrix. Of 
course, for every natural number n there exists an identity matrix of order n, and 
when we wish to emphasize the order of the identity matrix under consideration, we 
shall write E,,. 


Property 2.7 The determinant of the identity matrix E,,, for all n > 1, is equal to 1. 
Proof In formula (2.13), aj; =0 if i ~ 1, and aj; = 1. Therefore, |E| = M,;. The 


determinant Mj; has the same structure as | F|, but its order is n — 1. By the induc- 
tion hypothesis, we may assume that M,; = 1, which means that |E| = 1. 


In proving Properties 2.4, 2.6, and 2.7, it was necessary to use definition (2.13). 
Now we shall prove a series of properties of the determinant that can be formally 
derived from these first three properties. 


2.2 Determinants of Arbitrary Order 35 


Property 2.8 Tf all the elements of a row of a matrix are equal to 0, then the deter- 
minant of the matrix is equal to 0. 


Proof Let aj, = aj2 =--- = din = 0. We may set ajg = pbix, where p = 0, big 4 0, 
k =1,...,n, and apply the first assertion of Corollary 2.5. We obtain that |A| = 
p|A’|, where |A’| is some other determinant and the number p is equal to zero. We 
conclude that |A| = 0. 


Property 2.9 If we transpose any two (not necessarily adjacent) rows of a determi- 
nant, then the determinant changes sign. 


Proof Let us transpose the ith and jth rows, where i < j. The same result can be 
achieved by successively transposing adjacent rows. Namely, we begin by transpos- 
ing the ith and (i + 1)st rows, then the (¢ + 1)st and @ + 2)nd, and so on until 
the ith row has been moved adjacent to the jth row, that is, into the (j — 1)st 
position. At this point, we have carried out j — i — 1 transpositions of adjacent 
rows. Then we transpose the (j — 1)st and jth rows, thereby increasing the num- 
ber of transpositions to 7 — i. We then transpose the jth row with its succes- 
sive neighbors so that it occupies the ith position. In the end, we will have ex- 
changed the positions of the ith and jth rows, with all other rows occupying their 
original positions. In carrying out this process, we have transposed adjacent rows 
@—-j-N+1+G-j-1)=2G — j — 1) +1 times. This is an odd number. 
Therefore, by Property 2.6, which asserts that interchanging two rows of a matrix 
results in a change of sign in the determinant, the result of all transpositions in this 
process is a change in the determinant’s sign. 


Property 2.9 can also be stated thus: An elementary operation of type I on the 
rows of a determinant changes its sign. 


Property 2.10 If two rows of a matrix A are equal, then the determinant | A| is equal 
to zero. 


Proof Let us transpose the two equal rows of A. Then obviously, the determinant 
|A| does not change. But by Property 2.9, the determinant changes sign. But then 
we have |A| = —|A|, that is, 2| A] = 0, from which we may conclude that | A| = 0. 


Property 2.11 Tf an elementary operation of type II is performed on a determinant, 
it is unchanged. 


Proof Suppose that after adding c times the jth row of A to the ith row, we have 
the determinant A’. Its ith row is the sum of two rows, and by the second assertion 
of Corollary 2.5, we have the equality |A’| = D, + D2, where D, =|A|. As for 
the determinant D>, it differs from |A| in that in the ith row, it has c times the 
jth row. The factor c can be taken outside the determinant by the first assertion 
of Corollary 2.5. Then we have a determinant whose ith and jth rows are equal. 


36 2 Matrices and Determinants 


But by Property 2.10, such a determinant is equal to zero. Hence D2 = 0, and so 
|A’| = |Al. 


We remark that the properties proven above give us a very simple method for 
computing a determinant of order n. We have only to apply elementary operations 
to bring the matrix A into upper triangular form: 


a1 412 +++ Ain 

= O da22 ++: Gon 
A= 

(a ae 


Let us suppose that in the process of doing this, we have completed t elementary 
operations of type I and some number of operations of type II. Since operations 
of type II do not change the determinant, and an operation of type I multiplies the 
determinant by —1, we have |A| = (—1)'|A]. We shall now show that 


[Al = 1422+ + Ann. (2.20) 
Then 
|A| = (-1)'G@1G22 ++ Gn. (2.21) 


This is a formula for calculating | A|. 

We shall prove formula (2.20) by induction on n. Since in the matrix A, all ele- 
ments of the first column except a1; are equal to zero, it follows by formula (2.14) 
that we have the equality 


[Al =a11|A), (2.22) 
in which the determinant 
422 423, +++ Aan 
= O 433 +++ Gan 
l=). , 
0 0 me ann 


has a structure analogous to that of the determinant | A]. By the induction hypothesis, 
we obtain the equality \A’ | = 422433 +++ Ann. Substituting this expression into (2.22) 
yields the formula (2.20) for | A]. 

The properties of determinants that we have proved allow us to conclude an im- 
portant theorem on linear equations. 


Theorem 2.12 A system of n equations in n unknowns has a unique solution if and 
only if the determinant of the matrix of the system is different from zero. 


2.3 Properties that Characterize Determinants 37 


Proof We bring the system into triangular form: 


@11xX1 +a@12x2 + alhidy la BP dh Mite Be avidly ie Spare + a1nXn = b1, 
422X2 Si + drnXn = bo, 
AnnXn = bn 


By Corollary 1.19, the system has a unique solution if and only if 


a1 = 0, a22 F 0, ae) Gunn x 0. (2.23) 


On the other hand, the determinant of the matrix of the system is the product 
@11422+--Gnn, and it follows that it is different from zero if and only if (2.23) is 
satisfied. 


Corollary 2.13 A homogeneous system of n equations in n unknowns has a nonzero 
solution if and only if the determinant of the matrix of the system is equal to zero. 


This result is an obvious consequence of the theorem, since a homogeneous sys- 
tem of equations always has at least one solution, namely the null solution. 


Definition 2.14 A square matrix whose determinant is nonzero is said to be non- 
singular. Conversely, a matrix whose determinant is equal to zero is singular. 


In Sect. 2.1, we interpreted the determinant of order two as the area of a triangle 
in the plane, while a 3 x 3 determinant was viewed as the volume of a tetrahedron 
in three-dimensional space (with suitable coefficients). Clearly, the area of a trian- 
gle reduces to zero only if it degenerates into a line segment, and the volume of a 
tetrahedron is zero only if the tetrahedron degenerates into a planar figure. 

Such examples give an idea of the geometric sense of the singularity of a matrix. 
The notion of singularity will become clearer in Sect. 2.10, when we introduce the 
notion of inverse matrix, and most importantly, in subsequent chapters when we 
consider linear transformations of vector spaces. 


2.3 Properties that Characterize Determinants 
In the preceding section we said that the determinant is a function that assigns to a 
square matrix a number, and we proved two important properties of the determinant: 


1. The determinant is a linear function of the elements in each row. 
2. Transposing two rows of a determinant changes its sign. 


We shall now show that the determinant is in fact completely characterized by these 
properties, as formulated in the following theorem. 


38 2 Matrices and Determinants 


Theorem 2.15 Let F(A) be a function that assigns to a square matrix A of order n 
a certain number. If this function satisfies properties 1 and 2 above, then there exists 
a number k such that 


F(A) =KIA|. (2.24) 


In this case, the number k is equal to F(E), where E is the identity matrix. 


Proof First of all, we observe that from properties 1 and 2 it follows that the function 
F(A) is unchanged if we apply to the matrix A an elementary operation of type II, 
and that it changes sign if we apply an elementary operation of type I. This proves 
that from properties | and 2 above, we have the corresponding properties of the 
determinant (Properties 2.9 and 2.11 of Sect. 2.2). 

Let us now bring matrix A into echelon form using elementary operations. We 
write the matrix thus obtained in the form 


a1 412 an 
_ O a2 @2n 
A=]|. —— Is (2.25) 
0 O +++ Gn 
whereby we do not, however, assert that a1; #0, ..., Gan 4 0. Such a form can 


always be obtained, since for a square matrix in echelon form, all elements aj;;, 
i > j, that is, those below the main diagonal, are equal to zero. Let us assume that 
in the transition from A to A, we have performed t elementary operations of type I, 
while all the other operations were of type I. Since under an elementary operation 
of type II neither F(A) nor |A| is changed, and under elementary operations of 
type I, both expressions change sign, it follows that 


|A| = (-D ‘IAI, F(A) =(-1)' F(A). (2.26) 
In order to prove formula (2.24) in the general case, it now suffices to prove it for 
matrices A of the form (2.25), that is, to establish the equality F (A) =k|Al|, which, 


in turn, clearly follows from the relationships 


A| = 411422 ***Gnn, F(A) = F(E) -@1422***Gnn. (2.27) 


We observe that the first of these equalities is precisely the equality (2.20) from 
the previous section. Moreover, it is a consequence of the second equality, since 
the determinant |A|, as we have shown, is also a function of type F(A), possessing 
properties 1 and 2. And therefore, having proved the second equality in (2.27) for an 
arbitrary function F(A) possessing the given properties, we shall prove this again 
for the determinant. 


2.4 Expansion of a Determinant Along Its Columns 39 


It thus remains only to prove the second equality of (2.27). In view of property 1, 
we can take out from F(A) the factor @,: 


a1 412 ++: Gin 
_— O daz. +++ Gan 
F(A) =n: F . : : 
0 QO .. J 
Let us now add to rows 1,2,...,” — 1 the last row multiplied by the numbers 
—G1n, —2n, ..., —Gn—1n respectively. In this case, all elements, except the elements 


of the last column, are unchanged, and all the elements of the last column become 
equal to zero, with the exception of the mth, which remains equal to 1. Then let us 
apply analogous transformations to the matrix of smaller size with elements located 
in the first n — 1 rows and columns, and so on. Each time, the number a;; is factored 
out of F, and the argument is repeated. After doing this n times, we obtain 


hi: G -sade “D 
= O41 -:--. 0 

F(A) =@nn-+-@\1 °F : : a : , 
0 0 1 


which is the second equality of (2.27). 


2.4 Expansion of a Determinant Along Its Columns 


On the basis of Theorem 2.15, we can answer a question that arose earlier, in 
Sect. 2.2: does the first column play a special role in (2.12) and (2.13) for a de- 
terminant of order n? To answer this question, let us form an expression analogous 
to (2.13), but taking instead of the first column, the jth column. In other words, let 
us consider the function 


F(A) = ay jM1j — aj Maj +++» + (1) nj Myy- (2.28) 


It is clear that this function assigns to every matrix A of order a specific number. 
Let us verify that it satisfies conditions 1 and 2 of the previous section. To this end, 
we have simply to examine the proofs of the properties from Sect. 2.2 and convince 
ourselves that we never used the fact that it was precisely the elements of the first 
column that were multiplied by their respective minors. In other words, the proofs 
of these properties apply word for word to the function F(A). By Theorem 2.15, 
we have F(A) = k|A|, and we have only to determine the number & in the formula 
k=F(E). 

For the matrix E, all the elements a;; are equal to zero whenever i ~ j, and 
the elements a;; are equal to 1. Therefore, formula (2.28) reduces to the equality 


40 2 Matrices and Determinants 


F(a) = +Mj;. Since in formula (2.28) the signs alternate, the term a;; Mj; appears 
with the sign (—1)/*!. Clearly, Mj; is the determinant of the identity matrix E of 
order n — 1, and therefore, M;; = 1. As a result, we obtain that k = (— 1 +1 , which 
means that 


ay j;M\j — a2j;M2j;+---+ (—1)"t anj Maj =(-1))""|A|. 
We now move the coefficient (—1)/*! to the left-hand side: 
JA] = (-1)! thay jMij + (HI)! 7 a9 j;Maj tee + (H 1) aj] Mnj. (2.29) 


We see that the element a;; is multiplied by the expression ely Mj;, which is 
called its cofactor and denoted by A;;. We have therefore obtained the following 
result. 


Theorem 2.16 The determinant of a matrix A is equal to the sum of the elements 
from any of its columns each multiplied by its associated cofactor: 


|A| = a,j Aj +42; Aaj +++ + anjAnj- (2.30) 


In this statement, each column plays an identical role to that played by any other 
column. For the first column, it becomes the formula that defines the determinant. 
Formulas (2.29) and (2.30) are called the expansion of the determinant along the 
jth column. 

As an application of Theorem 2.16, we can obtain a whole series of new proper- 
ties of determinants. 


Theorem 2.17 Properties 2.4, 2.6, 2.7, 2.8, 2.9, 2.10, 2.11 and all their corollaries 
hold not only for the rows of a determinant, but for the columns as well. 


Proof If follows from formula (2.30) that the determinant is a linear function of the 
elements of the jth column, j = 1,...,. Consequently, Property 2.4 holds for the 
columns. 

We shall prove Property 2.6 by induction on the order n of the determinant. For 
n = 1, the assertion is empty. For n = 2, it can be checked using formula (2.3). Now 
let n > 2, and let us assume that we have transposed columns numbered k and k + 1. 
We make use of formula (2.30) for 7 4k, k + 1. Then both the kth and the (k + 1)st 
columns enter into every minor M;; (i = 1, ...,). By the induction hypothesis, un- 
der a transposition of two columns, each minor will change sign, which means that 
the determinant as a whole changes sign, which proves Property 2.6 for columns. We 
observe that in Property 2.7, the statement does not discuss rows or columns, and 
the remaining properties follow formally from the first three. Therefore, all seven 
properties and their corollaries are valid for the columns of a determinant. 


2.4 Expansion of a Determinant Along Its Columns 41 


In analogy to Theorem 2.15, from Theorem 2.17 it follows that any multilin- 
ear antisymmetric function” of the columns of a matrix must be proportional to 
the determinant function of the matrix. Consequently, we have the analogue of for- 
mula (2.24), where the function F(A) satisfies properties 1 and 2, reformulated for 
columns. In this case, the value k, as can easily be seen, remains the same. In partic- 
ular, for an arbitrary index i = 1,...,, we have the formula, analogous to (2.30), 


|A| = aj, Ai +.4j2Ai2 +--+ + Gin Ain. (2.31) 


It is called the expansion of the determinant |A| along the ith row. The formula 
for the column or row expansion of a determinant has a broad generalization that 
goes under the name Laplace’s theorem. It consists in the fact that one has an anal- 
ogous expansion of a square matrix of order 7 not only along a single column (or 
row), but for an arbitrary number m of columns, | <m <n — 1. For this, it is nec- 
essary only to determine the cofactor not of a single element, but of the minor of 
arbitrary order m. Laplace’s theorem can be proved, for example, by induction on 
the number m, but we shall not do this, but rather put off its precise formulation and 
proof to Sect. 10.5 (p. 379), where it will be obtained as a special case of even more 
general concepts and results. 


Example 2.18 In Example 1.20 (p. 15), we proved that the problem of interpolation, 
that is, the search for a polynomial of degree n that passes through n + | given 
points, has a unique solution. Theorem 2.12 shows that the determinant of the matrix 
of the corresponding linear system (1.20) is different from zero. Now we can easily 
calculate this determinant and once again verify this property. 

The determinant of the matrix of system (1.20) for r =n + 1 has the form 


1 ¢ ci se cy 
1 «© iG no 
|A| =]: : : _ = le (2.32) 
Ly ec vee GM 
1 chy Cai vee Cai 


It is called the Vandermonde determinant of order n + 1. We shall show that this 
determinant is equal to the product of all differences c; — c; fori > j, that is, that it 
can be written in the following form: 


Al =[ [ci —c;). (2.33) 
i>j 
We shall prove (2.33) by induction on the number n. For n = 1, the result is obvious: 


cl 


=(2—-C|. 
; (om) : : 


>For the definition and a discussion of antisymmetric functions, see Sect. 2.6. 


42 2 Matrices and Determinants 


For the proof of the general case, we use the fact that the determinant does not 
change under an elementary operation of type II (Property 2.11 from Sect. 2.2), and 
moreover, from Theorem 2.17, this property holds for columns as well as for rows. 
We will make use of this by subtracting the nth column multiplied by c; from the 
(n + 1)st, then the (n — 1)st multiplied by c; from the mth, and so on, all the way 
to the second column, from which we subtract the first multiplied by cy. By the 
indicated property, the determinant does not change under these operations, but on 
the other hand, it assumes the form 


1 0 0 tee 0 

1 c-c c2(c2 — ¢1) Neg - 1) 
|A| =|: : : i 

1 Cy — Cl Cn (Cn — €1) nieve cen — 1) 

1 cnay1—C1 Cnpi(Cnt1—e1) Cy nad —c1) 


Making use of Theorem 2.17, we apply to the first row of the determinant thus 
obtained (consisting of a single nonzero element) the analogue of formula (2.12). 
As aresult, we obtain 


=| 
c2—¢| c2(c2 — ¢1) ee 63 (Co 1) 
Cn — Cl Cn (Cn — C1) are mee (o —¢}) 
—1 
Chti— C1 Cn4i(Cng1— C1) +t Cn (Cn+l —c1) 


To the last determinant let us apply Corollary 2.5 of Sect. 2.2 and remove from 
each row its common factor. We obtain 


1 o@oss: a 

|A] = [A] = (c2 — ¢1) +++ (Cn — €1) (Cn — C1) |* ; . if. 234 
1 Cn si Ge 
D Geey. 43 Ca 


The last determinant is a Vandermonde determinant of order n, and by the induction 
hypothesis, we can assume that formula (2.33) holds for it. Putting the expression 
(2.33) for a Vandermonde determinant of order n into expression (2.34), we obtain 
the desired formula (2.33) for a Vandermonde determinant of order n + 1. Since 
we have assumed that all the numbers cy, ..., C41 are distinct, the product of the 
differences c; —c; fori > j must be different from zero, and we obtain a new proof 
of the result that polynomial interpolation as described has a unique solution. 


2.5 Cramer’s Rule 


We are now going to derive explicit formulas for the solution of a system of n 
equations in n unknowns, formulas for which we have developed the theory of de- 


2.5 Cramer’s Rule 43 


terminants. The matrix A of this system is a square matrix of order n, and we shall 
assume that it is not singular. 


Lemma 2.19 The sum of the elements a;; of an arbitrary (here the jth) column of 
a determinant each multiplied by the cofactor Aix corresponding to the elements of 
any other column (here the kth) is equal to zero: 


ayjAik +a2jAr +-+++anjAnk =0 fork F j. 


Proof We replace the kth column in our determinant |A| with its jth column. As 
a result, we obtain a determinant |A’| that by Property 2.10 of Sect. 2.2, reformu- 
lated for columns, is equal to zero. On the other hand, let us expand the determinant 
|A’| along the kth column. Since in forming the cofactors of this column, the ele- 
ments of the kth column cancel, we obtain the same cofactors A;x as in our original 
determinant |A|. Therefore, we obtain 


A’ = aj Aig + 42j Ar + +++ +4nj Ank = 0, 


which is what we wished to show. 


Theorem 2.20 (Cramer’s rule) [f the determinant of the matrix of a system of n 
equations in n unknowns is different from zero, then its solution is given by 


x=, k=l,...,n, (2.35) 


where D is the determinant of the matrix of the system, and Dx is obtained from D 
by replacing the kth column of the matrix with the column of constant terms. 


Proof By Theorem 2.12, we know that there is a unique collection of values for 
X1,...,X, that transforms the system 


ayixXy +--+ a4inxn =), 


Ani X1 +++ + dnnXn = by 


into the identity. Let us determine the unknown x, for a given k. 

To do so, we shall proceed exactly as in the case of systems of two and three 
equations from Sect. 2.1: we multiply the ith equation by the cofactor Aj, and then 
sum all the resulting equations. After this, the coefficient of x; will have the form 


aikAik + +++ + ankAnk = D. 
The coefficient of x; for j 4 k will assume the form 


aj Aik is ua nj Ank- 


44 2 Matrices and Determinants 


By Lemma 2.19, this number is equal to zero. Finally, for the constant term we 
obtain the expression 


bi Aig +++ + bn Ank. 


But it is precisely this expression that we obtain if we expand the determinant Dx 
along its kth column. Therefore, we arrive at the equality 


Dx, = Dx, 


and since D £0, we have x, = D;/D. This is formula (2.35). 


2.6 Permutations, Symmetric and Antisymmetric Functions 


A careful study of the properties of determinants leads to a number of important 
mathematical concepts relating to arbitrary finite sets that in fact could have been 
presented earlier. 

Let us recall that in Sect. 1.1 we studied linear functions as functions of rows 
of length n. In Sect. 2.2 we looked at determinants as functions of square ma- 
trices. If we are interested in the dependence of the determinant on the rows of 
its underlying matrix, then it is possible to consider it as a function of its n rows: 


|A| = F(a, a2,...,@n), where for the matrix 
a1 412 +++ Gin 
a21 422 +++ a2n 
A= 
Gni Q@n2 °°: nn 


we denote by a; its ith row: 
aj = (j1, 4j2,.-., Gin). 


Here we encounter the notion of a function F'(a,,a2,...,a,) of n elements of a set 
M as arule that assigns to any n elements from M, taken in a particular order, some 
element of another set N. Thus, F is a mapping from M” to N (see p. xvii). In our 
case, M is the set of all rows of fixed length n, and N is the set of all numbers. 

Let us introduce some necessary notation for the sequel. Let M be a finite set 
consisting of n elements a1, a2,..., An. 


Definition 2.21 A function on the n elements of a set M is said to be symmetric if 
it is unchanged under an arbitrary rearrangement of its arguments. 


After numbering the 7 elements of the set M with the indices 1,2,...,n, we can 
consider that we have arranged them in order of increasing index. A permutation of 
them can be considered a rearrangement in another order, which we shall write as 


2.6 Permutations, Symmetric and Antisymmetric Functions 45 


follows. Let j1, j2,..., jn represent the same numbers 1, 2, ...,, but perhaps listed 
in a different order. In this case, we shall say that (j1, j2,..., jn) iS a permutation 
of the numbers (1, 2,...,). Analogously, we shall say that (a@;,,@)j,,...,@j,) isa 
permutation of the elements (a1, a2,...,@y). 


Thus the definition of a symmetric function can be written as the equality 


F(@j,,@j),-..,@;,) = F(a, @2,..., An) (2.36) 

for all permutations (j1, j2,..., jn) of the numbers (1, 2,...,7). 
In order to determine whether one is dealing with a symmetric function, it is not 
necessary to verify equality (2.36) for all permutations (j1, j2,..., jn), but instead 


we can limit ourselves to certain permutations of the simplest form. 


Definition 2.22 A permutation of two elements of the set (a,,a2,..., a») is called 
a transposition. 


A transposition under which the ith and jth elements (that is, a; and a;) are 
transposed will be denoted by 7;, ;. Clearly, we may always assume that i < j. 
We have the following simple fact about permutations. 


Theorem 2.23 From any arrangement (i1,i2,...,1n) of distinct natural num- 
bers taking values from 1 to n, it is possible to obtain an arbitrary permutation 
(ji, j2;---» Jn) by carrying out a certain number of transpositions. 


Proof We shall use induction on n. For n = 1, the assertion of the theorem is a tau- 
tology: there exists only one permutation, and so it is unnecessary to introduce any 
transpositions at all. In the general case (n > 1), let us suppose that 7; stands at the 
kth position in the permutation (1, i2,...,i,), that is, 7; = ig. We will perform the 
transposition T;,, on this permutation. If j; =i, then it is not necessary to perform 
any transposition at all. We obtain the permutation (jj, i2,...,i1,...,%,), where jj 
is in the first position, and 7; is in the kth position. Now we need to use transposi- 
tions to obtain from the permutation (j1, i2,...,71,..., i) the second permutation, 
(j1, j2,---> Jn)» given in the statement of the theorem. 

If we cancel j; from both permutations, then what remains is a permutation of 
the numbers a such that | <a@ <n and a & j\. To these two permutations now 
consisting of only n — 1 numbers, we can apply the induction hypothesis and obtain 
the second permutation from the first. Beginning with the transposition T,,, we can 
thus obtain from the permutation (i1,i2,...,i,) the permutation (j1, j2,..., jn). 
In some cases, it will not be necessary to apply a transposition (for example, if 
ji =i). The limiting case can also be encountered in which it will not be necessary 
to use any transpositions at all. It is easy to see that such occurs only for i; = ji, 
i2 = j2,.--., in = jn. The assertion of the theorem is true in this case, but the set of 
transpositions used is empty. 


This very simple argument can be illustrated as follows. Let us suppose that at a 
concert, the invited guests sit down in the first row, but not in the order indicated on 


46 2 Matrices and Determinants 


the administrator’s guest list. How can he achieve the requisite ordering? Obviously, 
he may identify the guest who should be sitting in the first position and ask that 
person to change seats with the person sitting in the first chair. He will then do 
likewise with the guests who occupy the second, third, and so on, places, and in the 
end will have achieved the required order. 

It follows from Theorem 2.23 that in determining that a function is symmetric, 
it suffices to verify equality (2.36) for permutations obtained from the permutation 
(1,2,...,) by a single transposition, that is, to check that 


F(a@1,...,@j,...,@j,...,@n) = F(a1,...,@j,...,j,...,@n) 


for arbitrary a1,...,@,,i, and j. Indeed, if this property is satisfied, then applying 
various transpositions successively to the argument of the function F(a1,..., an), 
we will always obtain the same function, and by Theorem 2.23, we will finally 
obtain the function F(aj;,,...,aj,). 

For example, for n = 3, we have three transpositions: T),2, T2,3, 71,3. For the 
function F(a), a2, a3) = a\a2 + a,a3 + a2a3, for example, under the transposition 
T1,2, the term a; a2 remains unchanged, but the other two terms exchange places. The 
same sort of thing transpires for the other transpositions. Therefore, our function is 
symmetric. 

We now consider a class of functions that in a certain sense are the opposite of 
symmetric. 


Definition 2.24 A function on n elements of a set M is said to be antisymmetric if 
under a transposition of its elements it changes sign. 


In other words, 
F(Q),...,@j,...,@j,...,An) =—F(a,...,Aj,...,@j,...,An) 


for any @,...,@y, i, and j. 

The notions of symmetric and antisymmetric function play an extremely impor- 
tant role in mathematics and mathematical physics. For example, in quantum me- 
chanics, the state of a certain physical quantity in a system consisting of n (generally 
a very large number) elementary particles p;,..., Dn of a single type is described 
by a wave function W(p1,..., Pn) that depends on these particles and assumes com- 
plex values. In a certain sense, in the “general case,’ a wave function is symmetric 
or antisymmetric, and which of these two possibilities is realized depends only on 
the type of particle: photons, electrons, and so on. If the wave function is symmet- 
ric, then the particles are called bosons, and in this case, we say that the quantum- 
mechanical system under consideration is subordinate to the Bose-Einstein statis- 
tics. On the other hand, if the wave function is antisymmetric, then the particles 
are called fermions, and we say that the system is subordinate to the Fermi—Dirac 
statistics. 


3For example, photons are bosons, and the particles that make up the atom—electrons, protons, 
and neutrons—are fermions. 


2.6 Permutations, Symmetric and Antisymmetric Functions 47 


We shall return to a consideration of symmetric and antisymmetric functions in 
the closing chapters of this book. For now, we would like to answer the following 
question: How is an antisymmetric function transformed under an arbitrary permuta- 
tion of the indices? In other words, we would like to express F'(a;,,..., @,,) in terms 
of F(a,,...,4,) for an arbitrary permutation (i),...,i,) of the indices (1,...,7). 
To answer this, we again turn to Theorem 2.23, according to which the permutation 
(i,,...,%,) can be obtained from the permutation (1,...,7) via a certain number 
(k, let us say) of transpositions. However, the hallmark of an antisymmetric func- 
tion is that it changes sign under the transposition of two of its arguments. After k 
transpositions, therefore, it will have been altered by the sign (—1)*, and we obtain 
the relationship 


F(ai,,..-,i,) =(-D* F(ay,...,4n), (2.37) 
where the collection of elements a;,,...,a;, from the set M is obtained from the 
collection a;,...,@, by means of the permutation under consideration consisting 


of k transpositions. 

The relationship (2.37) has about it a certain ambiguity. Namely, the number k 
indicates the number of transpositions that are executed in passing from (1,...,7) 
to the permutation (i;,...,7,). But such a passage can in general be accomplished 
in a variety of ways, and so the required number k of transpositions can assume 
a number of different values. For example, to pass from (1, 2,3) to the permuta- 
tion (3, 2, 1), we could begin with the transposition t;,2, obtaining (2, 1,3). Then 
we could apply the transposition t2,3 and arrive at the permutation (2, 3, 1). And 
finally, again carrying out the transposition t;,2, we would arrive at the permutation 
(3, 2, 1). Altogether, we carried out three transpositions. On the other hand, we can 
carry out a single transposition (11,3), which from (1, 2, 3) gives us immediately the 
permutation (3, 2, 1). Nevertheless, let us note that we have not produced any incon- 
sistency with (2.37), since both values of k, namely 3 and 1, are odd, and therefore 
in both cases, the coefficient (— 1)* has the same value. 

Let us show that the parity of the number of transpositions used in passing from 
one given permutation to another depends only on the permutations themselves 
and not on the choice of transpositions. Let us suppose that we have an antisym- 
metric function F(a1,...,a,) that depends on n elements of a set M and is not 
identically zero. This last assumption means that there exists a set of distinct el- 
ements a;,...,@, from the set M such that F(a),...,a,) #0. On applying the 
permutation (i1,...,i,) to this set of elements, we obtain (a;,,...,4@;,), with the 
values F(aj,...,@,) and F(a;,,...,a;,) related by (2.37). If we can obtain the 
permutation (i1,...,in) from (1,..., 7) in two different ways, that is, using k and / 
transpositions, then from formula (2.37) we have the equality (— 1) = (-1)’, since 
F(a,...,@n) #0, and therefore the numbers k and / have the same parity, that is, 
either both are even or both are odd. 

But there is a function known to us that possesses this property, namely the deter- 
minant (as a function of the rows of a matrix)! Indeed, Property 2.9 from Sect. 2.2 
asserts that the determinant is an antisymmetric function of its rows. This function 
is nonzero for some a1,..., @,. For example, |E| = 1. In other words, to prove our 


48 2 Matrices and Determinants 


assertion, it suffices to consider the determinant of the matrix E as an antisymmet- 
ric function of its n rows e; = (0,..., 1,...,0), where there is a 1 in the ith place 
and zeros in the other places, fori = 1,...,. (In the course of our argument, these 
rows will be transposed, so that in fact, we shall consider determinants of matrices 
more complex than E.) Thus by a rather roundabout route, using properties of the 
determinant, we have obtained the following property of permutations. 


Theorem 2.25 For any passage from the permutation (1,...,n) to the permutation 
J = (j1.---, Jn) by means of transpositions (which is always possible, thanks to 
Theorem 2.23), the parity of the number of transpositions will be the same as for 
any other passage between these two permutations. 


Thus the set of all permutations of n items can be divided into two classes: those 
that can be obtained from the permutation (1, ...,7) by means of an even number of 
transpositions and those that can be obtained with an odd number of transpositions. 
Permutations of the first type are called even, and those of the second type are called 
odd. If some permutation J is obtained by k transpositions, then we introduce the 
notation 


e(J)=(-1*. 


In other words, for an even permutation J, the number e(J) is equal to 1, and for 
an odd permutation, we have e(J) = —1. 

We have proved the consistency of the notion of even and odd permutation in a 
rather roundabout way, using the properties of the determinant. In fact, it would have 
sufficed for us to produce any antisymmetric function not identically zero, and we 
used one that was familiar to us: the determinant as a function of its rows. We could 


have invoked a simpler function. Let M be a set of numbers, and for x1,...,x, € M, 
we set 
F(xX1,..+5Xn) = (%2 — X11) (43 — X11) +++ On — X1) ++ Xn — Xn-1) 
= | [q — xj). (2.38) 
i>j 


Let us verify that this function is antisymmetric. To this end, we introduce the fol- 
lowing lemma. 


Lemma 2.26 Any transposition can be obtained as the result of an odd number of 
transpositions of adjacent elements, that is, transpositions of the form Tk4+41. 


We actually proved this statement in essence in Sect. 2.2 when we derived Prop- 
erty 2.9 from Property 2.6. There we did not use the term “transposition,” and in- 
stead we spoke about interchanging the rows of a determinant. But that very simple 
proof can be applied to the elements of any set, and therefore we shall not repeat the 
argument. 


2.6 Permutations, Symmetric and Antisymmetric Functions 49 


Thus it suffices to prove that the function (2.38) changes sign under the exchange 
of xz and x,+41. But in this case, the factors (xj — xj) fori Ak, kK+1,j7 ARK K+1, 
on the right-hand side of the equation do not change at all. The factors (x; — xx) 
and (x; — x41) fori > k + 1 change places, as do (x, — xj) and (x41 — x;) for 
j <k+1 also. There remains a single factor (x,+1 — xx), which changes sign. It 
is also clear that the function (2.38) differs from zero for any distinct set of values 
Kisii Xp 
We can now apply formula (2.37) to the function given by relation (2.38), by 
which we proved Theorem 2.25, which means that the notion of the parity of a 
permutation is well defined. We note, however, that our “simpler” method is very 
close to our “roundabout” way with which we began, since formula (2.38) defines 
the Vandermonde determinant of order n (see formula (2.33) in Sect. 2.4). Let us 
choose the numbers x; in such a way that x; < x2 <--- <x, (for example, we may 
set x; =i). Then on the right-hand side of relation (2.38), all factors will be positive. 


Let us now write down the analogous relation for F'(x;,,..., x;,,). Since the per- 
mutation (i1,...,%,) assigns the number x;, to the number x;, from (2.37), we ob- 
tain 

Poin iti =| Oy, (2.39) 

k>l 
The sign of F(x;,,..., xi,) is determined by the number of negative factors on the 
right-hand side of (2.39). Indeed, F'(x;,,...,x;,) > 0 if the number of factors is 
even, while F'(x;,,..., Xj) < 0 if it is odd. Negative factors (x;, — xj,) arise when- 


ever X;, < x;,, and in view of the choice x; < x2 <--+ < X,, this means that i, < ij. 
It follows that to the negative factors (x;, — x;,) there correspond those pairs of 
numbers k and / for which k > / and i, < i). In this case, we say that the numbers 
ix and i; in the permutation (i1,...,i,) stand in reverse order, or that they form an 
inversion. Thus a permutation is even or odd according to whether it contains an 
even or odd number of inversions. For example, in the permutation (4, 3, 2,5, 1), 
the inversions are the pairs (4, 3), (4,2), (4, 1), G, 2), G, 1, @, 1, G, 1). In all, 
there are seven of them, which means that F(4, 3, 2,5, 1) < 0, and the permutation 
(4, 3, 2,5, 1) is odd. 
Using these concepts, we can now formulate the following theorem. 


Theorem 2.27 The determinant of a square matrix of order n is the unique function 
F (aj, @2,...,@n) of n rows of length n that satisfies the following conditions: 


(a) It is linear as a function of an arbitrary row. 

(b) It is antisymmetric. 

(c) F(e1,e2,...,@n) = 1, where e; is the row with 1 in the ith place and zeros in 
all other places. 


This is the most “scientific,” though far from the simplest, definition of the deter- 
minant. 

In this section, we have not presented a single new property of the determinant, 
instead discussing in detail its property of being an antisymmetric function of its 


50 2 Matrices and Determinants 


Fig. 2.2. Path length O B A C 


rows. The reason for this is that the property of antisymmetry of the determinant 
is connected with a large number of questions in mathematics. For example, in 
Sect. 2.1, we introduced determinants of orders 2 and 3. They have an important 
geometric significance, expressing the area and volume of simple geometric figures 
(Figs. 2.1(a) and (b)). 

But here we encounter a paradoxical situation: Sometimes, one obtains for the 
area (or volume) a negative value. It is easy to see that we obtain a positive or neg- 
ative value for the area of triangle OAB (or the volume of the tetrahedron OABC) 
depending on the order of the vertices A, B (or A, B, C). More precisely, the area of 
triangle O AB is positive if we can obtain the ray OA from OB by rotating it clock- 
wise through the triangle, while the area is negative if we obtain OA by rotating 
OB counterclockwise through the triangle (in other words, the rotation is always 
through an angle of measure less than 7). Thus the determinant expresses the area 
of a triangle (with coefficient 5) with a given ordering of the sides, and the area 
changes sign if we reverse the order. That is, it is an antisymmetric function. 

In the case of volume, choosing the order of the vertices is connected to the 
concept of orientation of space. The same concept appears as well in hyperspaces 
of dimension n > 3, but for now, we shall not go too deeply into such questions; 
we shall return to them in Sects. 4.4 and 7.3. Let us say only that this concept is 
necessary for constructing the theory of volumes and the theory of integration. In 
fact, the notion of orientation arises already in the case n = 1, when we consider 
the length of an interval OA (where O is the origin of the line, namely the point 0, 
and the point A has the coordinate x) to be the determinant x of order 1, which will 
be positive precisely when A lies to the right of O. Analogously, if the point B has 
coordinate y, then the length of the segment AB is equal to y — x, which will be 
positive only if B lies to the right of A. Thus the length of a segment depends on 
the ordering of its endpoints, and it changes sign if the endpoints exchange places 
(thus length is an antisymmetric function). It is only by a similar convention that we 
can say that the length of OABC is equal to the length of OC (Fig. 2.2). And if we 
were to use only positive lengths, then we would end up with the length of OABC 
being given by the expression |OA|+|AB|+|BA|+|AC|=|OC|+2|AB|. 


2.7 Explicit Formula for the Determinant 


Formula (2.12), which we used in Sect. 2.2 to compute the determinant of order n, 
expresses that determinant in terms of determinants of smaller orders. It is assumed 
that this method can be applied in turn to these smaller determinants, and passing 
to determinants of smaller and smaller orders, to arrive at a determinant of order 1, 
which for the matrix (a1;) is equal to a;,. We thereby obtain an expression for the 


2.7 Explicit Formula for the Determinant 51 


determinant of the matrix 


ait a2 ain 

a21 + a22 Q2n 
A= 

Qn1 4n2 *** nn 


in terms of its elements. This expression is rather complicated, and for deriving 
the properties of determinants it is simpler to use the inductive procedure given in 
Sect. 2.2. But now we are ready to discover this complicated definition. First of all, 
let us prove a lemma, which appears obvious at first glance but nonetheless requires 
proof (though it is very simple). 


Lemma 2.28 /f the linear function f(x) for a row x of length n is written in two 
ways, 
n 


f@=lax, f@)=) bx, 


i=1 i=l 


then aj = bi, a2 =)2,..., dn = by. 


Proof Both of the equations for f(x) must hold for arbitrary x. Let us suppose in 
particular that x = e; = (0,...,1,...,0), where | is located in the ith position (we 
have already encountered the rows e; in the proof of Theorem 1.3). Then from the 
initial supposition, we obtain that f(e;) = a;, and from the second, that f(e;) = bj. 
Therefore, a; = b; for all i, which is what was to be proved. 


We shall consider the determinant |A| as a function of the rows a), da2,...,@y of 
the matrix A. As shown in Sect. 2.2, the determinant is a linear function of any row 
of the matrix. A function from any number m of rows all of length 7 is said to be 
multilinear if it is linear in each row (with the other rows held fixed). 


Theorem 2.29 A multilinear function F(a, a2,...,@m) can be expressed in the 
form 


F(a, a2,...,@m) = Qi i, gin Ui, A2in *** Amin » (2.40) 
(i, ,i2 petty im) 
if as usual, a; = (Gj1, 4j2,.--, in), and the sum is taken over arbitrary collections 


of numbers (i, i2,...,im) from the set 1,2,...,n, where oj, i,...,i,, are certain co- 
efficients that depend only on the function F and not on the rows a,,a@2,...,@m. 


Proof The proof is by induction on the number m. For m = 1, the proof of the 
theorem is obvious by the definition of a linear function. For m > 1, we shall use 


52 2 Matrices and Determinants 


the fact that 


n 
F(a), 42, ..-,4m) = > Gi(2,---.@m) aii (2.41) 
i=l 
for arbitrary a,, where the coefficients g; depend on a2,..., Am; that is, they are 
functions of these numbers. 
Let us verify that all the functions g; are multilinear. Let us show, for example, 
linearity with respect to a2. Using the linearity of the function F(a), a2,...,am) 
with respect to a2, we obtain 


F(a\,@,+45,...,am) = F(a1,45,...,am) + F(a1,45,...,am), 


or 


n 


n 
7 (a, +a4,..., Am)X; = YS Gi (a5, cas am) + 9;(a5,..., Am)) Xi 


i=l i=l 


for xj = a4;, that is, for arbitrary x;. From this, by the lemma, we obtain 


gi(a,+a5,...,a4m) =gi(a),...,am)+ gi (a5, ...,am). 


In precisely the same way, we can verify the second property of linear functions 
in Theorem 1.3. From this theorem it is seen that the functions g;(a2,..., @m) are 
linear with respect to az, and analogously that they are multilinear. Now by the 
induction hypothesis, we have for each of them the expression 


Pi(A2,-+-14m)= Bip. Pin *** Ain (2.42) 


aang 


(the index i in Bi, : indicates that these constants are connected with the function 
gi). To complete the proof, it remains for us, changing notation, to set i = i), to 
substitute the expressions (2.42) into (2.41), and set Bis = 


sim EDs Qo ee Im* 


Remark 2.30 The constants «;, ;, 
the formulas 


in the relationship (2.40) can be found from 


pete Im 


Cit iy, 51m = PF (Ci, » Cins «+ +s Cin) (2.43) 


where e; again denotes the row (0,...,1,...,0), in which there is a | in the jth 
position and zeros everywhere else. 

Indeed, if we substitute a) = e;,, d2 = @j,, ..., Am = e@;,, in the relationship 
(2.40), then the term a1;,42;,+:-Gmi,, becomes 1, while the remaining products 
A j2j.°** Amj, are equal to 0. This proves (2.43). 


Let us now apply Theorem 2.29 and (2.43) to the determinant |A| as a function 
of the rows a1, a@2,..., A, of the matrix A. Since we know that the determinant is a 


2.8 The Rank of a Matrix 53 


multilinear function, it must satisfy the relationship (2.40) (m = n), and the coeffi- 
cients @;, ;,,...,;, Can be determined from formula (2.43). Consequently, a, j,,...,i, 18 
equal to the determinant | £;, ;,,...;,,| of the matrix whose first row is equal to e;,, the 
second is é;,,..., and the nth is e;,,. If any of the numbers 1, i2,..., in are equal, 
then | Ej, i,,...,i,| = 9, in view of Property 2.10 of Sect. 2.2. It thus remains to exam- 
ine the determinant |£;,i,,.,i,,| in the case that (7), i2,...,i,) is a permutation of 
the numbers (1, 2, ...,). But this determinant is obtained from the determinant | E'| 
of the identity matrix if we operate on its rows by the permutation (11, i2,..., in). 
Furthermore, we know that the determinant is an antisymmetric function of its rows 
(see Property 2.9 in Sect. 2.2). Therefore, we can apply to it property (2.37) of anti- 
symmetric functions, and we obtain 


|Ei,,i2,....i,1 = €U)-|E|, where I = (ij, i2,...,in). 


sig 


Since |E| = 1, we have the equalities a;, ;,,...,i,, = €(Z) if the permutation J is equal 
to (11, 12,..., in). 
As aresult, we obtain an expression for the determinant of the matrix A: 


JA =) e(1) + atiyaniy +++ Ani, (2.44) 
I 
where the sum ranges over all permutations I = (i1,i2,...,i,) of the numbers 


(1, 2,...,”). The expression (2.44) is called the explicit formula for the determi- 
nant. It is worthwhile reformulating this in words: 


The determinant of a matrix A is equal to the sum of terms each of which is the product of 
n elements a;; of the matrix A, taken one from each row and column. If the factors of such 
a product are arranged in increasing order of the row numbers, then the term appears with a 
plus or minus sign depending on whether the corresponding column numbers form an even 
or odd permutation. 


2.8 The Rank of a Matrix 


In this section, we introduce several fundamental concepts and use them to prove 
several new results about systems of linear equations. 


Definition 2.31 A matrix whose ith row coincides with the ith column of a matrix 
A for all i is called the transpose of the matrix A and is denoted by A”. 


It is clear that if we denote by aj; the element located in the ith row and jth 
column of the matrix A, and by b;; the corresponding element of the matrix A*, 
then bj; = aj;;. If the matrix A is of type (n,m), then A* is of type (m,n). 


Theorem 2.32 The determinant of the transpose of a square matrix is equal to the 
determinant of the original matrix. That is, |A*| = |A\. 


54 2 Matrices and Determinants 
Proof Consider the following function of a matrix A: 
F(A) = |A*|. 


This function exhibits properties 1 and 2 formulated in Sect. 2.3 (page 37). Indeed, 
the rows of the matrix A* are the columns of A, and thus the assertion that the 
function F(A) (that is, the determinant |A*| as a function of the matrix A) possesses 
properties | and 2 for the rows of the matrix A is equivalent to the assertion that the 
determinant |A*| possesses the same properties for its columns. This follows from 
Theorem 2.17. Therefore, Theorem 2.15 is applicable to F(A), whence 


F(A) =k/Al, 
where k = F(E) = |E*|, with E the n x n identity matrix. Clearly, E* = E, and 


therefore, k = |E*| = |E| = 1. It follows that F(A) = |A|, which completes the 
proof of the theorem. 


Definition 2.33 A square matrix A is said to be symmetric if A = A*, and antisym- 
metric if A= —A%*. 


It is clear that if a;; denotes the element located in the ith row and jth column of 
a matrix A, then the condition A = A* can be written in the form aj = aji, while 
A = —A* can be written as a;; = —aj;;. From this last relationship, it follows that all 
elements a;; on the main diagonal of an antisymmetric matrix must be equal to zero. 
Furthermore, it follows from the properties of the determinant that an antisymmetric 
matrix of odd order is singular. Indeed, if A is a square matrix of order n, then from 
the definition of multiplication of a matrix by a number and the linearity of the 
determinant in each row, we obtain the relationship |— A*| = (—1)"|A|, from which 
A = —A* yields |A| = (—1)"|A|, which in the case of odd n is possible only if 
|A| =0. 

Symmetric and antisymmetric matrices play an important role in mathematics 
and physics, and we shall encounter them in the following chapters, for example in 
the study of bilinear forms. 


Definition 2.34 A minor of order r of a matrix 


aii a\2 “t+ Gln 
a21 a22 ‘t+ GQn 

A= . ; ; . (2.45) 
Gm1 Gm2 °*** 4Gmn 


is a determinant of order r obtained from the matrix (2.45) by eliminating all entries 
of the matrix except for those simultaneously in r given rows and r given columns. 
Here we clearly must assume that r < m andr <n. 


2.8 The Rank of a Matrix 55 


For example, the minors of order | are the individual elements of the matrix, 
while the unique minor of order n of a square matrix of order n is the determinant 
of the entire matrix. 


Definition 2.35 The rank of matrix (2.45) is the maximum over the orders of its 
nonzero minors. 


In other words, the rank is the smallest number r such that all the minors of rank 
Ss >r are equal to zero or there are no such minors (if r = min{m, n}). 
Let us note one obvious corollary of Theorem 2.32. 


Theorem 2.36 The rank of a matrix is not affected by taking the transpose. 
Proof The minors of the matrix A* are obtained as the transposes of the minors 


of matrix A (in taking the transpose, the indices of the rows and columns change 
places). Therefore, the ranks of the matrices A* and A coincide. 


Let us recall that in presenting the method of Gaussian elimination in Sect. 1.2, 
we introduced elementary row operations of types I and II on the equations of a 
system. These operations changed both the coefficients of the unknowns and the 
constant terms. If we now focus our attention solely on the coefficients of the un- 
knowns, then we may say that we are carrying out elementary operations on the rows 
of the matrix of the system. This gives us the possibility of using Gauss’s method to 
determine the rank of a matrix. 

A fundamental property of the rank of a matrix is expressed in the following 
theorem. 


Theorem 2.37 The rank of a matrix is unchanged under elementary operations on 
its rows and columns. 


Proof We shall carry out the proof for elementary row operations of type II (for 
type I, the proof is analogous, and even simpler). After adding p times the jth row 
of the matrix A to the ith row, we obtain a new matrix; call it B. We shall denote the 
rank of a matrix by the operator rk and suppose that rk A = r. If among the nonzero 
minors of order r of the matrix A there is at least one not containing the ith row, 
then it will not be altered by the given operation, and it follows that it will be a 
nonzero minor of the matrix B. Therefore, we may conclude that rk B > r. 

Now let us suppose that all nonzero minors of order r of the matrix A contain 
the ith row. Let M be one such minor, involving rows numbered i), ..., i, where 
ix =i for some k, 1 <k <r. Let us denote by N the minor of the matrix B involv- 
ing the columns with the same indices as M. If j coincides with one of the numbers 
ij,...,7,, then this transformation of the matrix A is also an elementary transfor- 
mation of the minor M, under which it is converted into N. Since the determinant 
is unaffected by an elementary transformation of type II, we must have N = M, 
whence it follows that rk B > r. 


56 2 Matrices and Determinants 


Now suppose that j does not coincide with one of the numbers i),...,i,. Let 
us denote by M’ the minor of the matrix A involving the same columns as M and 
rows numbered i1,...,ix—1, J, ik41,.--,¢,. In other words, M’ is obtained from M 
by replacing the ith by the jth row of the matrix A. Since the determinant is a 
linear function of its rows, we therefore have the equality N = M + pM’. But by 
our assumption, M’ = 0, since the minor M’ does not contain the ith row of the 
matrix A. Thus we obtain the equality N = M, from which it follows that rk B > r. 

Thus in all cases we have proved that rk B > rk A. However, since the matrix A, 
in turn, can be obtained from B by means of elementary operations of type II, we 
have the reverse rk A > rk B. From this, it clearly follows that rk A = rk B. 

By similar arguments, but carried out for operations on the columns, we can 
show that the rank of a matrix is unchanged under elementary column operations. 
Furthermore, the assertion for the columns follows from analogous assertions about 
the rows if we make use of Theorem 2.36. 


Now we are in a position to formulate answers to the questions that were resolved 
earlier by Theorems 1.16 and 1.17, without reducing the system to echelon form 
but instead using explicit expressions that depend on the coefficients. Bringing the 
system into echelon form will be present in our proofs, but will not appear in the 
final formulations. 

Let us assume that by elementary operations, we have brought a system of equa- 
tions into echelon form (1.18). By Theorem 2.37, both the rank of the matrix of 
the system and the rank of the augmented matrix will have remained unchanged. 
Clearly, the rank of the matrix of (1.18) is equal to r: a minor at the intersection of 
the first r rows and the r columns numbered 1,k,..., 5 is equal to 11424 +--+ drs, 
which implies that it is different from zero, and any other minor of greater order 
must contain a row of zeros and is therefore equal to zero. Therefore, the rank of the 
matrix of the initial system (1.3) is equal to r. 

The rank of the augmented matrix of system (1.18) is also equal to r if all the 
constants bry =.-.-=b, are equal to zero or if there are no equations with such 
numbers (m =r). However, if at least one of the numbers brs; ...,Dn is differ- 
ent from zero, then the rank of the augmented matrix will be greater than r. For 
example, if b,+, 40, then the minor of order r + 1 involving the first r + 1 rows 
of the augmented matrix and the columns numbered 1,k,...,5,2 + 1 is equal to 
1142 -- = dyg Dra: and is different from zero. Thus the compatibility criterion for- 
mulated in Theorem 1.16 can also be expressed in terms of the rank: the rank of 
the matrix of system (1.3) must be equal to the rank of the augmented matrix of the 
system. Since by Theorem 2.37, the rank of the matrix and augmented matrix of the 
initial system (1.3) are equal to the ranks of the corresponding matrices of (1.18), 
we obtain the compatibility condition called the Rouché—Capelli theorem. 


Theorem 2.38 The system of linear equations (1.3) is consistent if and only if the 
rank of the matrix of the system is equal to the rank of the augmented matrix. 


The same considerations make it possible to reformulate Theorem 1.17 in the 
following form. 


2.8 The Rank of a Matrix 57 


Theorem 2.39 Jf the system of linear equations (1.3) is consistent, then it is definite 
(that is, it has a unique solution) if and only if the rank of the matrix of the system 
is equal to the number of unknowns. 


We can explain further the significance of the concept of the rank of a matrix in 
the theory of linear equations by introducing a further notion, one that is important 
in and of itself. 


Definition 2.40 Suppose we are given m rows of a given length n: a), a2,..., 4m. 
A row a of the same length is said to be a linear combination of aj, a2, ..., Am if 
there exist numbers p1, p2,..., Pm Such that a = pia, + prd2+---+ Pmam. 


Let us mention two properties of linear combinations. 


1. If a is a linear combination of the rows a|,...,@ , each of which, in turn, is a 
linear combination of the same set of rows bj,..., by, then a is a linear combi- 
nation of the rows by,..., Dx. 


Indeed, by the definition of a linear combination, there exist numbers q;; such 
that 


aj = qiib) + qizb2 + +--+ qikbe, = i=1,...,m, 
and numbers p; such that a = pja, + p2ad2 +---+ Pmam. Substituting in the 
last equality the expression for the rows a; in terms of b;,..., bg, we obtain 
a= pi(qiibi + 412b2 +--+ + Gikbe) 
+ po(q2ibi + q22b2 + +++ + Goxbe) + °° 
+ Pm(Gmi 81 + dm2b2 + +++ + dk bx). 


Removing parentheses and collecting like terms yields 


a= (piqiu + p2ga1 +--+ + Pmgmi)b1 
+ (pigi2 + p2g22 + +++ + Pmdm2)b2 + ++ 
+ (Pidik + P2grk + +++ + Pmdmk) bx, 


that is, the expression a as a linear combination of the rows by,..., Dx. 
2. When we apply elementary operations to the rows of a matrix, we obtain rows 
that are linear combinations of the rows of the original matrix. 
This is obvious for elementary operations both of type I and of type II. 


Let us apply Gaussian elimination to a certain matrix A of rank r. Changing the 
numeration of the rows and columns, we may assume that a nonzero minor of order 
r is located in the first r rows and r columns of the matrix. Then by elementary 


58 2 Matrices and Determinants 


operations on its first r rows, the matrix is put into the form 


G1 G2 t+ Aty Girt t+ Gin 
0 G22, +++ Gar arti t+ Gn 
A= 0 0 tet pp Arr-1 tte Grn 
— > 
Gr411 << uae oS : se pay 
Gm me Sake 8 . bias “ae 
where a1; #0, ..., Gry #0. We can now subtract from the (r + 1)st row the first 


row multiplied by a number such that the first element of the row thus obtained 
is equal to zero, then the second row multiplied by a number such that the second 
element of the row thus obtained equals zero, and so on, until we obtain the matrix 


Gy 4j2 +++ Gly Atrt1 + Gn 

G22, +++ G2r — 2r41 ttt 

A = 0 0 ttt Apr Grr+1 yee arn 
0 0 te 0 Gr+ir+l cet ay+in 

0 0 bas 0 Gmr-+1 cae Amn 


Since the matrix A was obtained from A using a sequence of elementary operations, 
its rank must be equal to r. 

Let us show that the entire (r + 1)st row of the matrix A consists of zeros. Indeed, 
if there were an element in the row a,+1, 4 0 for some k = 1, ...,n, then the minor 
of the matrix A formed by the intersection of the first r + 1 rows and the columns 
numbered 1, 2,...,7, k would be given by 


a1 G12 +++ Gir Gik 
O ao. +++ Go Gr 
= 41422 -+-ArrAyp+ik FO, 

0 0 te pr ark 

O O +++ O  Gr+ik 
which contradicts the established fact that the rank of A is equal to r. 

This result can be formulated thus: If a1, ...,@,-+ are the first r + 1 rows of the 

matrix A, then there exist numbers p),..., p- such that 


G41 — pid) —---— pra, =9. 


2.8 The Rank of a Matrix 59 


From this, it follows that a4.) = pja, +---+ p,a,. That is, the row @,, is a linear 
combination of the first r rows of the matrix A. But the matrix A was obtained as 
the result of elementary operations on the first r rows of the matrix A, whence it 
follows that all rows of the matrices A and A numbered greater than r coincide. 
We see, therefore, that the (r + 1)st row of the matrix A is a linear combination of 
the rows @,...,@,41, each of which, in turn, is a linear combination of the first r 
rows of the matrix A. Consequently, the (r + 1)st row of the matrix A is a linear 
combination of its first r rows. 

This line of reasoning carried out for the (r + 1)st row can be applied equally 
well to any row numbered i > r. Therefore, every row of the matrix A is a linear 
combination of its first r rows (note that in this case, the first r rows played a special 
role, since for notational convenience, we numbered the rows and columns in such 
a way that a nonzero minor was located in the first r rows and first r columns). In 
the general case, we obtain the following result. 


Theorem 2.41 Jf the rank of a matrix is equal to r, then all of its rows are linear 
combinations of some r rows. 


Remark 2.42 To put it more precisely, we have shown that if there exists a nonzero 
minor of order equal to the rank of the matrix, then every row can be written as a 
linear combination of the rows in which this minor is located. 


The application of these ideas to systems of linear equations is based on the fol- 
lowing obvious lemma. Here, as in a high-school course, we shall call the equation 
F(x) = bacorollary of equations (1.10) if every solution ¢ of the system (1.10) sat- 
isfies the relationship F(c) = b. In other words, this means that if we assign to the 
system (1.10) one additional equation F(x) = b, we obtain an equivalent system. 


Lemma 2.43 [fin the augmented matrix of the system (1.3), some row (say with in- 
dex 1) is a linear combination of k rows, with indices i,,..., ix, then the lth equation 
of the system is a corollary of the k equations with those indices. 


Proof The proof proceeds by direct verification. To simplify the presentation, let us 
assume that we are talking about the first k rows of the augmented matrix. Then by 


definition, there exist k numbers a1, ..., a@% such that 
01 (411, 212, ---, Ain, D1) + 2 (21, A22,---,@2n, b2) +++ 
+ Ok (Ak1, Ak2, +++, Akn, Dk) 
= (an, a[2, 5 a ain, bj). 
This means that for every i = 1,...,”, the following equations are satisfied: 
ajay; +0202; +---+apayj =a, fori=1,2,...,n, 


ab) +agby +--+ +agby = by. 


60 2 Matrices and Determinants 


Then if we multiply equations numbered 1, 2,..., in our system by the numbers 
Q1,...,@% respectively and add the products, we obtain the /th equation of the sys- 
tem. That is, in the notation of (1.10), we obtain 


a Fy (x) +--+ a F(x) = Fix), aby +--+ + aby = by. 


Substituting here x = c, we obtain that if F\(c) = b1, ..., Fx (c) = by, then we have 
also F)(c) = b;. That is, the /th equation is a corollary of the first k equations. 


By combining Lemma 2.43 with Theorem 2.41, we obtain the following result. 


Theorem 2.44 /f the rank of the matrix of system (1.3) coincides with the rank of 
its augmented matrix and is equal to r, then all the equations of the system are 
corollaries of some r equations of the system. 


Therefore, if the rank of the matrix of the combined system (1.3) is equal to r, 
then it is equivalent to a system consisting of some r equations of system (1.3). It is 
possible to select as these r equations any such that in the rows with corresponding 
indices there occurs a nonzero minor of order r of the matrix of the system (1.3). 


2.9 Operations on Matrices 


In this section, we shall define certain operations on matrices that while simple, are 
very important for the following presentation. First, we shall define these operations 
purely formally. Their deeper significance will become clear in the examples pre- 
sented below, and above all, in the following chapter, where matrices are connected 
to geometric concepts by linear transformations of vector spaces. 

First of all, let us agree that by the equality A = B for two matrices is meant 
that A and B are matrices of the same type and that their elements (denoted by aj; 
and b;;) with like indices are equal. That is, if A and B each have m rows and n 
columns, then to write A = B means that the m - n equalities aj; = b;; hold for all 
indicesi=1,...,mandj=1,...,n. 


Definition 2.45 Let A be an arbitrary matrix of type (m,n) with elements a;;, and 
let p be some number. The product of the matrix A and the number p is the matrix 
B, also of type (m,n), whose elements satisfy the equations b;; = paj;. It is denoted 


Just as is done for numbers, the matrix obtained by multiplying A by the number 
—1 is denoted by —A and is called the additive inverse or opposite. In the case of the 
product obtained by multiplying an arbitrary matrix of type (m, n) by the number 0, 
we obviously obtain a matrix of the same type, all of whose elements are zero. It is 
called the null or zero matrix of type (m,n) and is denoted by 0. 


2.9 Operations on Matrices 61 


Definition 2.46 Let A and B be two matrices, each of type (m,n), with elements 
denoted as usual by aj; and b;;. The sum of A and B is the matrix C, also of type 
(m,n), whose elements c;; are defined by the formula c;; = a;; + bj;. This is written 
as the equality C= A+ B. 


Let us emphasize that both sum and equality are defined only for matrices of the 
same type. 

With these definitions in hand, it is now easy to verify that just as in the case 
of numbers, one has the following rules for removing parentheses: (p + q)A = 
pA+4qA for any two numbers p, g and matrices A, as wellas p(A+ B) = pA+ pB 
for any number p and matrices A, B of the same type. It is just as easily verified that 
the addition of matrices does not depend on the order of summation, A+ B = B+ A, 
and that the sum of three (or more) matrices does not depend on the arrangement of 
parentheses, that is, (A+ B) +C =A+(B+C). Using addition and multiplication 
by —1, it is possible as well to define the difference of matrices: A— B = A+(—B). 

We now define another, the most important of all, operation on matrices, called 
the matrix product or matrix multiplication. Like addition, this operation is defined 
not for matrices of arbitrary type, but only for those whose dimensions obey a certain 
relationship. 


Definition 2.47 Let A be a matrix of type (m,n), whose elements we shall denote 
by a;;, and let B be a matrix of type (n, k) with elements b;; (we observe that here 
in general, the indices i and j of the elements a;; and b;; run over different sets 
of values). The product of matrices A and B is the matrix C of type (m, k) whose 
elements c;; are determined by the formula 


Cij = 41D; + Gi2b2j + +++ + dinbn;. (2.46) 
We write the matrix product as C= A- B or simply C= AB. 


Thus the product of two rectangular matrices A and B is defined only in the case 
that the number of columns of matrix A is equal to the number of rows of matrix B, 
while otherwise, the product is undefined (the reason for this will become clear in 
the following chapter). The important special case n = m = k shows that the product 
of two (and therefore, an arbitrary number of) square matrices of the same order is 
well defined. 

Let us clarify the above definition with the help of some examples. 


Example 2.48 In what follows, we shall frequently encounter matrices of types 
(1,7) and (n, 1), that is, rows and columns of length n, often called row vectors 
and column vectors. For such vectors it is convenient to introduce special notation: 


Bi 
a = (Q],...Qn), [Bl = : |, (2.47) 
Bn 


62 2 Matrices and Determinants 


that is, w is a matrix of type (1,7), while [8] is a matrix of type (n, 1). Such matrices 
are clearly related by the transpose operator: [w] = «* and [B] = B*. By definition, 
then, the product of the matrices in (2.47) is a matrix C of type (1, 1), that is, a 
number c, which is equal to 


c=a,pi+---+anBn- (2.48) 


In the cases n = 2 and n = 3, the product (2.48) coincides with the notion of the 
scalar product of vectors, well known from courses in analytic (or even elemen- 
tary) geometry, if we consider w and [f] as vectors whose coordinates are written 
respectively in the form of a row and the form of a column. 

Using formula (2.48), we can express the product rule of matrices given by for- 
mula (2.46) by saying that one multiplies the rows of matrix A by the columns of 
matrix B. Put more precisely, the element c;; is determined by formula (2.48) as the 
product of the ith row a; of matrix A and the jth column [8]; of matrix B. 


Example 2.49 Let A be a matrix of type (m,n) from formula (1.4) (p. 2), and let 
[x] be a matrix of type (1,7), that is, a column vector, comprising the elements 
X1,...,Xy, Written analogously to the right-hand side of (2.47). Then their product 
A[x] is a matrix of type (m, 1), that is, a column vector, comprising, by formula 
(2.46), the elements 


Qj1X] +. aj2X2 +++ +inXn, i=1,...,m. 


This shows that the system of linear equations (1.3) that we studied in Sect. 1.1 can 
be written in the more abbreviated matrix form A[x] = [b], where [b] is a matrix of 
type (m, 1) comprising the constants of the system, b;,..., b,,, written as a column. 


Example 2.50 By linear substitution is meant the replacement of variables whereby 
old variables (x1,..., Xm) are linear functions of some new variables (y1,..., Yn), 
that is, they are expressed by the formulas 


X1 =411y1 +412¥2 + +++ +4inYn, 


X2 =a21¥1 + 42292 + +++ + 42nYn, (2.49) 


Xm = 4m1Y1 + Am2y2 +++ + GmnYn, 


with certain coefficients a;;. The matrix A = (a;;) is called the matrix of the substi- 
tution (2.49). Let us consider the result of two linear substitutions. Let the variables 
(y1,---» Yn) be expressed in turn by (z1,..., ZK) according to the formula 


yt =bi1Z1 + b1222 +--+ + bi KzK, 


—p b ous de opep, 
y2 2121 + 02222 + +++ + O2KZk (2.50) 


Yn = bniZ1 + bn2Z2 + +++ + bake, 


2.9 Operations on Matrices 63 


with coefficients b;;. Substituting formulas (2.50) into (2.49), we obtain an expres- 
sion for the variables (x1, ..., X,,) in terms of (z1,..., Zx): 


Xi = aj (dizi tees + bez) +++ + in (baz +++ + OnkZx) 
= (aj1b\, +--+ +Ginbai)Z1 ++++ + Giibik + +++ + Ginbnk)ze. (2.51) 


As was done in the previous example, we may write linear substitutions (2.49) and 
(2.50) in the matrix forms [x] = A[y] and [y] = B[z], where [x], Ly], [z] are col- 
umn vectors, whose elements are the corresponding variables, while A and B are 
matrices of types (m,n) and (n,k) with elements a;; and b;;. Then, by definition 
(2.46), formula (2.51) assumes the form [x] = C[z], where the matrix C is equal to 
AB. In other words, successive application of two linear substitutions gives a linear 
substitution whose matrix is equal to the product of the matrices of the substitutions. 


Remark 2.51 All of this makes it possible to formulate a definition of matrix product 
in terms of linear substitutions: the matrix product of A and B is the matrix C that 
is the matrix of the substitution obtained by successive applications of two linear 
substitutions with matrices A and B. 


This obvious remark makes it possible to give a simple and graphic demonstra- 
tion of an important property of the matrix product, called associativity. 


Theorem 2.52 Let A be a matrix of type (m,n), and let B be a matrix of type (n, k), 
and matrix D of type (k,1). Then 


(AB)D = A(BD). (259) 


Proof Let us first consider the special case / = 1, that is, the matrix D in (2.52) 
is a k-element column vector. As we have remarked, (2.52) is in this case a sim- 
ple consequence of the interpretation of the matrix product of A and B as the 
result of carrying out two linear substitutions of the variables; in the notation of 
Example 2.50, we have simply to substitute [z] = D and then use the equalities 
[y] = B[z], [x] = Aly], and [x] = C[z]. 

In the general case, it suffices for the proof of equation (2.52) to observe that 
the product of matrices A and B is reduced to the successive multiplication of 
the rows of A by the columns of B. That is, if we write the matrix B in col- 
umn form, B = (B,..., Bg), then AB can analogously be written in the form 
AB = (AB\,..., ABx), where each AB; is a matrix of type (m, 1), that is, also 
a column vector. After this, the proof of equality (2.52) in the general case is almost 
self-evident. Let D consist of / columns: D = (D,,..., D;). Then on the left-hand 
side of (2.52), one has the matrix 


(AB)D = ((AB)Dj,...,(AB)D)), 
and on the right-hand side, the matrix 


A(BD) = A(BDj,..., BD;) = (A(BD)),..., A(BD)), 


64 2 Matrices and Determinants 


and it remains only to use the proved equality (2.52) with / = | for each of the 
column vectors D;,..., D). 


Let us note that we already considered the associative property in a more abstract 
form (p. xv). By what was proved there, it follows that the product of any number 
of factors does not depend on the arrangement of parentheses among them. Thus 
the associative property makes it possible to compute the product of an arbitrary 
number of matrices without indicating any arrangement of parentheses (it is nec- 
essary only that each pair of associated matrices correspond as to their dimensions 
so that multiplication is defined). In particular, the result of the product of an arbi- 
trary square matrix by itself an arbitrary number of times is well defined. It is called 
exponentiation. 

Just as for numbers, the operations of addition and multiplication of matrices are 
linked by the relationships 


A(B+C)=AB+AC, (A+ B)C=AC+ BC, (2.53) 


which clearly follow from the definitions. The property (2.53) connecting addition 
and multiplication is called the distributive property. 

We mention one important property of multiplication involving the identity ma- 
trix: for an arbitrary matrix A of type (m,n) and an arbitrary matrix B of type 
(n,m), the following equalities hold: 


AE, =A, E,B=B. 


The proofs of both equalities follow from the definition of matrix multiplication, for 
example, using the rule “row times column.” We see, then, that multiplication by the 
matrix E plays the same role as multiplication by 1 among ordinary numbers. 

However, another familiar property of multiplication of numbers (called com- 
mutativity), namely that the product of two numbers is independent of the order in 
which they are multiplied, is not true for matrix multiplication. This follows at a 
minimum from the fact that the product AB of a matrix A of type (n,m) and a ma- 
trix B of type (/, k) is defined only if m =1. It could well be that m =/ but k #n, 
and then the matrix product BA would not be defined, while the product AB was. 
But even, for example, in the case n =m = k =/1 = 2, with 


_(@ b _(P 4 
a-(C a). e=(r 4) 
where both products AB and BA are defined, we obtain 


ap+br aq+bs ap+cq bp+dq 
AB= ; BA= > 
cp+dr cq+ds ar+cs br+ds 


and these are in general unequal matrices. Matrices A and B for which AB = BA 
are called commuting matrices. 


2.9 Operations on Matrices 65 


In connection with the multiplication of matrices, notation is used that we will 
introduce only in the special case that we shall actually encounter in what follows. 
Assume that we are given a square matrix A of order n and a natural number p <n. 
The elements of the matrix A located in the first p rows and first p columns form 
a square matrix A,; of order p. The elements located in the first p rows and last 
n — p columns form a rectangular matrix Aj2 of type (p,n — p). The elements 
located in the first p columns and last n — p rows form a rectangular matrix A2; of 
type (n — p, p). Finally, the elements in the last n — p rows and last n — p columns 
form a rectangular matrix A22 of order n — p. This can be written as follows: 


Ai A12 
A= . 2.54 
fe ) on 


Formula (2.54) is called the expression of A in block form, while matrices 
Aj1, Ai2, A21, A22 are the blocks of the matrix A. For example, with these con- 
ventions, formula (2.15) takes the form 


Ay A 


l= a oe 


= |A11| - |Azzl. 


Clearly, one can conceive of a matrix A in block form for a larger number of matrix 
blocks of various sizes. In addition to the case (2.54) shown above, we shall find 
ourselves in the situation in which blocks stand on the diagonal: 


Ay O 0 
O Ao 0 
A= . 
0 O Ak 
Here A; are square matrices of orders nj, i = 1,...,k. Then A is a square matrix of 


order n =n, +--+ +n x. It is called a block-diagonal matrix. 

It is sometimes convenient to notate matrix multiplication in block form. We shall 
consider only the case of two square matrices of order n, broken into blocks of the 
form (2.54) all of the same size: 


Ai. Aj2 By, By 
A= ; B= . 2.55 
(a A22 Bo; By aie 
Here Aj; and By; are square matrices of order p, A12 and Bj2 are matrices of type 
(p,n — p), Az and Bo; are matrices of type (n — p, p), A22 and Boz are square 


matrices of order n — p. Then the product C = AB is well defined and is a matrix 
of order n that can be broken into the same type of blocks: 


Ci Ci 
C= . 
ie ee 


66 2 Matrices and Determinants 
We claim that in this case, 


Cy, = Ay By + Aj2 Bar, Ci2 = Aj Biz + Ai2 Bo, 
C21 = Az) By, + A22 Ba, C22 = A21 By2 + A272 Bop. 


(2.56) 


In other words, the matrices (2.55) are multiplied just like matrices of order 2, 
except that their elements are not numbers, but blocks, that is, they are themselves 
matrices. The proof of formulas (2.56) follows at once from formulas (2.46). For 
example, let C = (c;j), where 1 <i, j < p. In formula (2.46), the sum of the first 
p terms gives the element é. j in the matrix A; Bj, while the sum of the remaining 
n — p terms gives the elements Cc; - in the matrix A12B2;. Of course, analogous 
formulas hold as well (with the same proof) for the multiplication of rectangular 
matrices with differing decompositions into blocks; it is necessary only that these 
partitions agree among themselves in such a way that the products of all matrices 
appearing in the formulas are defined. However, in what follows, only the case (2.55) 
described above will be necessary. 

The transpose operation is connected with multiplication by an important rela- 
tionship. Let the matrix A be of type (n,m), and matrix B of type (m, k). Then 


(AB)* = B*A*. (2.57) 


Indeed, by the definition of matrix product (formula (2.46)), an element of the matrix 
AB standing at the intersection of the jth row and ith column is equal to 


ajibyi + aj2boi +--+ + ajmbmi, Wwherei=1,...,n,j=1,...,k. (2.58) 


By definition of the transpose, the expression (2.58) gives us the value of the element 
of the matrix (A B)* standing at the intersection of the ith row and the jth column. 
On the other hand, let us consider the product of matrices B* and A*, using the 
rule “row times column” formulated above. Then, taking into account the definition 
of the transpose, we obtain that the element of the matrix B*A* standing at the 
intersection of the ith row and jth column is equal to the product of the ith column 
of the matrix B and the jth row of the matrix A, that is, equal to 


byjaj, + bajaj2 + +++ + Dmiajm. 


This expression coincides with the formula (2.58) for the element of the matrix 
(AB)* standing at the corresponding place, and this establishes equality (2.57). 

It is possible to express, using the operation of multiplication, the elementary 
transformations of matrices that we used in Sect. 1.2 in studying systems of linear 
equations. Without specifying this especially, we shall continue to keep in mind that 
we are always multiplying matrices whose product is well defined. 

Suppose that we are given a rectangular matrix 


2.9 Operations on Matrices 67 


aii a\2 spre Ain 

a2) a22 a2n 
A= : 

Gm1 Gm2 *** Gmn 


Let us consider a square matrix of order m obtained from the identity matrix of order 
m by interchanging the ith and jth rows: 


1 0 
j 
1 Y 
0 0 0 1 0 <i 
0 1 0 
Tj = : 
0 1 0 
0 1 0 0 0 <—|j 
t 1 
L 
0 1 


An easy check shows that 7;;A is also obtained from A by transposing the ith and 
jth rows. Therefore, we can express an elementary operation of type I on a matrix 
A by multiplication on the left by a suitable matrix 7;;. 

Let us consider (for i # j) a square matrix U;;(c) of order m depending on the 
number c: 


1 0 
j 
1 1 
0 1 0 ¢ 0 -|i 
0 1 0 
Uij(c) = : 
0 1 O 
) 0 0) 1 0 iJ 
i 1 
I 
0 1 


(2.59) 


68 2 Matrices and Determinants 


It is obtained from the identity matrix of order m by adding the jth row multiplied 
by c to the ith row. An equally easy verification shows that the matrix Uj;(c)A is 
obtained from A by adding the jth row multiplied by the number c to the ith row. 
Therefore, we can also write an elementary operation of type II in terms of matrix 
multiplication. Consequently, Theorem 1.15 in matrix form can be expressed as 
follows: 


Theorem 2.53 An arbitrary matrix A of type (m,n) can be brought into echelon 
form by multiplying on the left by the product of a number of suitable matrices T;; 
and U;;(c) (in the proper order). 


Let us examine the important case in which A and B are square matrices of 
order n. Then their product C = AB is also a square matrix of order n. 


Theorem 2.54 The determinant of the product of two square matrices of identical 
orders is equal to the product of their determinants. That is, |AB| = |A|-|B|. 


Proof Let us consider the determinant |AB| for a fixed matrix B as a function, 
which we denote by F(A), of the rows of the matrix A. We shall prove first that 
the function F(A) is multilinear. We know (by Property 2.4 from Sect. 2.2) that 
the determinant |C| = F(A), considered as a function of the rows of the matrix 
C = AB, is multilinear. In particular, it is a linear function of the ith row of the 
matrix C, that is, 


F(A) =ayjcj1 + a2¢;2 + +++ + Onin (2.60) 


for some numbers a ,...,@,. Let us focus attention on the fact that according to 
formula (2.46), the ith row of the matrix C = AB depends only on the ith row of 
the matrix A, while the remaining rows of the matrix C, in contrast, do not depend 
on this row. After substituting into formula (2.60) the expressions (2.46) for the el- 
ements of the ith row and collecting like terms, we obtain an expression for F(A) 
as a linear function of the ith row of the matrix A. Therefore, the function F(A) is 
multilinear in the rows of A. Now let us transpose two rows of the matrix A, say 
with indices i; and iz. Formula (2.46) shows us that the /th row of the matrix C 
for 1 £11, iz does not change, but its i; th and ith rows exchange places. Therefore, 
|C| changes sign. This means that the function F(A) is antisymmetric with respect 
to the rows of the matrix A. We can apply to this function Theorem 2.15, and we 
then obtain that F(A) = k|A|, where k = F(E) = |EB| =|B\, since for an arbi- 
trary matrix B, the relationship E B = B is satisfied. We thereby obtain the equality 
F(A) =|A|- |B], whence according to our definition, F(A) = |AB|. 


Theorem 2.54 has a beautiful generalization to rectangular matrices known as 
the Cauchy-Binet identity. We shall not prove it at present, but shall give only its 
formulation (a natural proof will be given in Sect. 10.5 on p. 377). 

The product of two rectangular matrices B and A results in a square matrix of 
order m if B is of type (m,n), and A is of type (n, m). The minors of the matrices B 


2.9 Operations on Matrices 69 


and A of the same order equal to the lesser of n and m are called associates if they 
stand in the columns (of matrix B) and rows (of matrix A) with the same indices. 
The Cauchy-—Binet identity asserts that the determinant |B A| is equal to 0 ifn < m, 
and |BA| is equal to the sum of the associated minors of order m if n > m. In this 
case, the sum is taken over all collections of rows (of matrix A) and columns (of 
matrix B) with increasing indices ij <iz <--+ <i. 

We have a beautiful special case of the Cauchy—Binet identity when 


a, by 

pa(M 2 or es az bp 
~ by by bata bn : ~ : : 
an bn 


Then 


BA= ay +a5+---+a; ab, + agb2 +--+ +anbn 
aby + agby +--+ + anbn bit bet+--+b7 : 


and the associated minors assume the form 


for all i < j, taking values from | to n. The Cauchy-—Binet identity gives us the 
equality 


(aj +43 +---+a7)(bj +.b3 +--+» +2) — (aby + andy +--+ + nbn)? 
= So (aibj = ajbi)*. 


i<j 
In particular, we derive from it the well-known inequality 


(a? +3 +++» +.42)(b} +b ++-- +b?) > (aby +agby +++» + anbn)’. 


The operations of addition and multiplication of matrices make it possible to 
define polynomials in matrices. In this we shall of course assume that we are always 
speaking about square matrices of a certain fixed order. We shall first define the 
operation of exponentiation, namely raising a matrix to the nth power. By definition, 
A” for n > 0 is the result of multiplying the matrix A by itself n times, while for 
n = 0, the result will be the identity matrix E. 


Definition 2.55 Let f(x) = ag + ajx +--- + a,x* be a polynomial with numeric 
coefficients. Then a matrix polynomial f for a matrix A is the matrix 


f(A) =a0E +ajA+--- +a A*, 


Let us establish some simple properties of matrix polynomials. 


70 2 Matrices and Determinants 


Lemma 2.56 /f f(x) + g(x) = u(x) and f(x)g(x) = v(x), then for an arbitrary 
square matrix A we have 


F(A) + g(A) = u(A), (2.61) 
f(A)g(A) = v(A). (2.62) 


Proof Let f(x) = ~7_ya@ix' and g(x) = ae Bjx/. Then u(x) = >>, yx" and 
v(x) = >°, 65x", where the coefficients y, and 6, can be written in the form 


s 
Vr =a, + By, b=) Pei, 
i=0 


where a, = 0 if r >n, and f, = 0 if r > m. The equality (2.61) is now perfectly 
obvious. For the proof of (2.62), we observe that 


f(A)g(A) = oa A! : >> BA! =) wea 
j=l i,j 


i=1 


Collecting all terms for which i + j = s, we obtain formula (2.62). 


Corollary 2.57 The polynomials f(A) and g(A) for the same matrix A commute: 
f(A)g(A) = (A) f(A). 


Proof The result follows from formula (2.62) and the equality f(x)g(x) = 
g(x) f(x). 


Let us observe that the analogous assertion to the lemma just proved is not true for 
polynomials in several variables. For example, the identity (x + y)(x — y) =x?— y? 
will not be preserved in general if we replace x and y with arbitrary matrices. The 
reason for this is that the identity depends on the relationship xy = yx, which does 
not hold for arbitrary matrices. 


2.10 Inverse Matrices 
In this section we shall consider exclusively square matrices of a given order n. 


Definition 2.58 A matrix B is called the inverse of the matrix A if 
AB=E. (2.63) 


Here E denotes the identity matrix of the fixed order n. 


2.10 Inverse Matrices 71 


Not every matrix has an inverse. Indeed, applying Theorem 2.54 on the determi- 
nant of a matrix product to equality (2.63), we obtain 


|E| = |AB|=|A]-|Bl, 


and since |E| = 1, then we must have |A| - |B| = 1. Clearly, such a relationship 
is impossible if |A| = 0. Therefore, no singular matrix can have an inverse. The 
following theorem shows that the converse of this statement is also true. 


Theorem 2.59 For every nonsingular matrix A there exists a matrix B satisfying 
the relationship (2.63). 


Proof Let us denote the yet unknown jth column of the desired inverse matrix B by 
[b];, while [e]; will denote the jth column of the identity matrix £. The columns 
[b]; and [e]; are matrices of type (n, 1), and by the product rule for matrices, the 
equality (2.63) is equivalent to the n relationships 


Alb]; =[elj, j=l,-.on. (2.64) 


Therefore, it suffices to prove the solvability of each (for each fixed j) system of 
linear equations (2.64) for the n unknowns that are the elements of the matrix B 
appearing in column [b];. But for every index j, the matrix of this system is A, and 
by hypothesis, |A| 4 0. By Theorem 2.12, such a system has a solution (and indeed, 
a unique one). Taking the solution of the system obtained for each index j as the 
jth column of the matrix B, we obtain a matrix satisfying the condition (2.63), that 
is, we have found an inverse to the matrix A. 


Let us recall that matrix multiplication is not commutative, that is, in general, 
ABZ BA. Therefore, it would be natural to consider another possible definition of 
the inverse matrix of A, namely a matrix C such that 


CA=E. (2.65) 


The same reasoning as that carried out at the beginning of this section shows that 
such a matrix C does not exist if A is singular. 


Theorem 2.60 For an arbitrary nonsingular matrix A, there exists a matrix C sat- 
isfying relationship (2.65). 


Proof This theorem can be proved in two different ways. First, it would be possible 
to repeat in full the proof of Theorem 2.59, considering now instead of the columns 
of the matrices C and E, their rows. But perhaps there is a somewhat more elegant 
proof that derives Theorem 2.60 directly from Theorem 2.59. To this end, let us 
apply Theorem 2.59 to the transpose matrix A*. By Theorem 2.32, |A*| = |Al|, and 
therefore, |A*| 4 0, which means that there exists a matrix B such that 


KG=T (2.66) 


72 2 Matrices and Determinants 


Let us apply the transpose operation to both sides of (2.66). It is clear that E* = E. 
On the other hand, by (2.57), 


(a*By" = B°(A*y* 


and it is easily verified that (A*)* = A. We therefore obtain B* A = E, and in (2.65) 
we can take the matrix B* for C, where B is defined by (2.66). 


The matrices B from (2.63) and C from (2.65) can make equal claim to the title of 
inverse of the matrix A. Fortunately, we do not obtain here two different definitions 
of the inverse, since these two matrices coincide. Namely, we have the following 
result. 


Theorem 2.61 For any nonsingular matrix A there exists a unique matrix B sat- 
isfying (2.63) and a unique matrix C satisfying (2.65). Moreover, the two matrices 
are equal. 


Proof Let A be a nonsingular matrix. We shall show that the matrix B satisfy- 
ing (2.63) is unique. Let us assume that there exists another matrix, B’, such that 
AB’ = E. Then AB = AB’, and if we multiply both sides of this equality by the 
matrix C such that CA = E, whose existence is guaranteed by Theorem 2.60, then 
by the associative property of matrix multiplication, we obtain (CA)B = (CA)B’, 
whence follows the equality EB = EB’, that is, B = B’. In exactly the same way 
we can prove the uniqueness of C satisfying (2.65). 

Now let us show that B = C. To this end, we consider the product C(A B) and 
make use of the associative property of multiplication: 


C(AB) = (CA)B. (2.67) 


Then on the one hand, AB = E and C(AB) = CE = C, while on the other hand, 
CA=E and (CA)B = EB = B, and relationship (2.67) gives us B=C. 


This unique (by Theorem 2.61) matrix B = C is denoted by A™! and is called 
the inverse of the matrix A. Thus for every nonsingular matrix A, there exists an 
inverse matrix A~! satisfying the relationship 


AA~'=A7!A=E, (2.68) 


and such a matrix A~! is unique. 

In following the proof of Theorem 2.59, we see that it is possible to derive an 
explicit formula for the inverse matrix. We again assume that the matrix A is non- 
singular, and following the notation used in the proof of Theorem 2.59, we arrive at 
the system of equations (2.64). Since |A| 4 0, we can find a solution of this system 
using Cramer’s rule (2.35). For an arbitrary index j = 1, ..., in system (2.64), the 


2.10 Inverse Matrices 73 


ith unknown coincides with the element b;; of the matrix B. Using Cramer’s rule, 
we obtain for it the value 
Dij 

bij —_ |A| ’ 
where Dj; is the determinant of the matrix obtained from A by replacing the ith 
column by the column [e];. The determinant D;; can be expanded along the ith 
column, and by formula (2.30), we obtain that it is equal to the cofactor of the 
unique nonzero (and equal to 1) element of the ith column. Since the 7th column is 
equal to [e];, there is a | at the intersection of the ith column (which we replaced 
by [e];) and the jth row. Therefore, D;; = Aj;, and formula (2.69) yields 


(2.69) 


bj j= ai, 
|A| 
This is an explicit formula for the elements of the inverse matrix. In words, this can 
be formulated thus: to obtain the inverse matrix of a nonsingular matrix A, one must 
replace every element with its cofactor, then transpose the matrix thus obtained and 
multiply it by the number |A|~!. 
For example, for the 2 x 2 matrix 


a b 
a=( 4) 
with 6 = |A| = ad — bc £0, we obtain the inverse matrix 


_ d/5 —b/8 
. eu ia 


The concept of inverse matrix provides a simple and elegant notation for the 
solution of a system of n equations in n unknowns. If, as in the previous section, 
we write down the system of linear equations (1.3) with n = m and A a nonsingular 
matrix in the form A[x] = [b], where [x] is the column of unknowns x1,..., Xp 
and [b] is the column consisting of the constants of the system, then multiplying 
this relationship on the left by the matrix A~!, we obtain the solution in the form 
[x] = A~'[b]. Thus, in matrix notation, the formulas for the solution of a system 
of n linear equations in n unknowns look just like those for a single equation in 
a single unknown. But if we use the formulas for the inverse matrix, then we see 
that the relationship [x] = A-![b] exactly coincides with Cramer’s rule, so that this 
more elegant notation gives us nothing essentially new. 

Let us consider the matrix A = (aij), in which the element a;; = Aj; is the 
cofactor of the element aj; of the matrix A. The matrix A is called the adjugate 
matrix to A. For a matrix A of order n, the elements of the adjugate matrix are 
polynomials of degree n — | in the elements of A. Formula (2.69) for the inverse 
matrix shows that 


AA=AA=|A|E. (2.70) 


74 2 Matrices and Determinants 


The advantage of the adjugate matrix A compared to the inverse matrix A~! is that 
the definition of A does not require division by |A], and formula (2.70), in contrast 
to the analogous formula (2.68), holds even for |A| = 0, that is, even for singular 
square matrices, as the proof of Cramer’s rule demonstrates. We shall make use of 
this fact in the sequel. 

In conclusion, let us return once more to the question of presenting elementary 
operations in terms of matrix multiplication, which we began to examine in the 
previous section. It is easy to see that the matrices 7;; and U;;(c) introduced there 
are nonsingular, and moreover, 


Ty =Ty, Uj \(e)=Uiy(-e). 


Therefore, Theorem 2.53 can be reformulated as follows: An arbitrary matrix A can 
be obtained from a particular echelon matrix A’ by multiplying it on the left by 
matrices 7;; and Uj;(c) in a certain order. 

Let us apply this result to nonsingular square matrices of order n. Since |7;;| 4 0, 
|Uij;(c)| 40, and |A| 4 0 (by assumption), the matrix A’ must also be nonsingular. 
But a nonsingular square echelon matrix is in upper triangular form, that is, all of 
its elements below the main diagonal are equal to zero, namely, 


fd / / / 
4, 42 430" iy, 
/ 3 / 
49 993, 
/ U fue / 
A’=]| 0 0 33 a, |, 
0 0 Oo -. a 


nn 


Piictel ake oe all 
and moreover, |A’| = 44,459 °°" Gy: 


main diagonal are different from zero. 

But this matrix A’ can be brought into a yet simpler form with the help of ele- 
mentary operations of type II only. Namely, since a/,, #0, one can subtract from 
the rows with indices n — 1,n —2,..., 1 of the matrix A’ the last row multiplied by 
factors that make all the elements of the nth column (except for a/,,,) equal to zero. 
Since a),_,,_; #0, it is possible in the same way to reduce to zero all elements 
of the (n — 1)st column (except for the element a4 n_1): Doing this n times, we 
shall make all of the elements of the matrix equal to zero except those on the main 


diagonal. That is, we end up with the matrix 


Therefore, all the elements a‘ ,,...,q/,, on the 


a, 0 O = 0 
O ab “G aee 0 

p=-|0 0 a -- Of, (2.71) 
0 0 0 oe @ 


nn 


A matrix all of whose elements are equal to zero except for those on the main 
diagonal is called a diagonal matrix. We have thus proved that a matrix A’ can be 


2.10 Inverse Matrices 75 


obtained from a diagonal matrix D by multiplying it on the left by matrices of the 
form 7;; and U;;(c) in some order. 

Let us note that multiplication by a matrix 7;; (that is, an elementary operation 
of type I) can be replaced by multiplication on the left by matrices of type Uj; (c) 
for various c and by a certain simpler matrix. Namely, the interchange of the ith and 
jth rows can be obtained using the following four operations: 


1. Addition of the ith row to the jth row. 
2. Subtraction of the jth row from the ith row. 
3. Addition of the ith row to the jth row. 


Schematically, this can be depicted as follows, where the ith and jth rows are 
denoted by ¢; and c;: 


a Gee) a, 
— — — 2 
Cj Cj + Cj Ci + Cj Cj 


4. It is now necessary to introduce a new type of operation: its effect is to multiply 
the ith row by —1 and is achieved by multiplying (with k = 7) our matrix on the 
left by the square matrix 


=~ 


Sk = -1 < 


= 


(2.72) 


1 


where there is —1 at the intersection of the kth row and kth column. 


We may now reformulate Theorem 2.53 as follows: 


Theorem 2.62 Any nonsingular matrix can be obtained from a diagonal matrix by 
multiplying it on the left by certain matrices Uj; (c) of the form (2.59) and matrices 
Sx of the form (2.72). 


We shall use this result in Sect. 4.4 when we introduce the orientation of a real 
vector space. Furthermore, Theorem 2.62 provides a simple and convenient method 
of computing the inverse matrix, in a manner based on Gaussian elimination. To this 
end, we introduce yet another (a third) type of elementary matrix operation, which 
consists in multiplying the kth row of a matrix by an arbitrary nonzero number a. 
It is clear that the result of such an operation can be obtained by multiplying our 


76 2 Matrices and Determinants 


matrix on the left by the square matrix 


— 
R<|> 


Via) = 


os. lg (2.73) 
1 


1 


where the number a stands at the intersection of the kth row and kth column. By 
multiplying the matrix (2.71) on the left by the matrices Vj (a’ Tee 5 Vila ), we 
transform it into the identity matrix. 

From Theorem 2.62, it follows that every nonsingular matrix can be obtained 
from the identity matrix by multiplying it on the left by matrices Uj;(c) of the type 
given in (2.59), matrices S,; from (2.72), and matrices V;(a@) of the form of (2.73). 
However, since multiplication by each of these matrices is equivalent to an elemen- 
tary operation of one of the three types, this means that every nonsingular matrix 
can be obtained from the identity matrix using a sequence of such operations, and 
conversely, using a certain number of elementary operations of all three types, it is 
possible to obtain the identity from an arbitrary nonsingular matrix. This gives us 
a convenient method of computing the inverse matrix. Indeed, suppose that using 
some sequence of elementary operations of all three types, we have transformed 
matrix A to the identity matrix EF. Let us denote by B the product of all the matrices 
Uj; (c), Sx, and Vx(a@), whose product corresponds to the given operations (in the 
obvious order: the matrix representing each successive operation stands to the left 
of the previous one). Then BA = E, from which it follows that B = A~!. Then af- 
ter applying the same sequence of elementary operations to the matrix FE, we obtain 
from it the matrix BE = B, that is, A~!. Therefore, to compute A7!, it suffices to 
transform the matrix A to E using elementary operations of the three types (as was 
shown above), while simultaneously applying the same operations to the matrix E. 
The matrix obtained from EF as a result of the same elementary operations will be 
Act, 

Let C be an arbitrary matrix of type (m,n). We shall show that for an arbitrary 
nonsingular square matrix A of order m, the rank of the product AC is equal to 
the rank of C. Indeed, as we have already seen, the matrix A can be transformed 
into E by applying some sequence of elementary operations of the three types to its 
rows, to which corresponds multiplication on the left by the matrix A~!. Applying 
the same sequence of operations to AC, we clearly obtain the matrix A~'AC = C. 
By Theorem 2.37, the rank of a matrix is not changed by elementary operations 
of types I and II. It also does not change under elementary operations of type III. 
This clearly follows from the fact that every minor is a linear function of its rows, 
and consequently, every nonzero minor of a matrix remains a nonzero minor after 
multiplication of any of its rows by an arbitrary nonzero number. Therefore, the rank 
of the matrix AC is equal to the rank of C. 


2.10 Inverse Matrices 77 


Using an analogous argument for the columns as was given for the rows, or sim- 
ply using Theorem 2.36, we obtain the following useful result. 


Theorem 2.63 For any matrix C of type (m,n) and any nonsingular square matri- 
ces A and B of orders m and n, the rank of AC B is equal to the rank of C. 


Chapter 3 
Vector Spaces 


3.1 The Definition of a Vector Space 


Vectors on a line, in the plane, or in space play a significant role in mathematics, and 
especially in physics. Vectors represent the displacement of bodies, or their speed, 
acceleration, or the force applied to them, among many other things. 

In a course in elementary mathematics or physics, a vector is defined as a di- 
rected line segment. The word directed indicates that a direction is assigned to the 
segment, often indicated by an arrow drawn above it. Or else, perhaps, one of the 
two endpoints of the segment [A, B], say A, is called the beginning, while the other, 
B, is the end, and then the direction is given 3 as motion from the beginning of the 


segment to the end. Then two vectors x = AB and y= CD are said to be equal if 
it is possible by means of parallel translation to join the segments x and y in sucha 
way that the beginning A of segment x coincides with the beginning C of segment 
y (in which case their ends must coincide as well); see Fig. 3.1. 

The fact that we consider the two different vectors in the figure to be equal 
does not represent anything unusual in mathematics or generally in human thought. 
Rather, it represents the usual method of abstraction, whereby we focus our atten- 
tion on some important property of the objects under consideration. Thus in ge- 
ometry, we consider certain triangles to be equal, even though they are drawn on 
different sheets of paper. Or in arithmetic, we might consider equal the number of 
people in a boat and the number of apples on a tree. 

It is obvious that having chosen a certain point O (on a line, in the plane, or in 
space), we can find a vector (indeed the unique one) equal to a given vector x whose 
beginning coincides with the point O. 

The laws of addition of velocities, accelerations, and forces lead to the following 


— = 
definition of vector addition. The sum of vectors x = AB and y = CD is the vector 


— — 
z= AD’, where D’ is the end of vector BD’, a vector equal to y whose beginning 
coincides with the end B of the vector x; see Fig. 3.2. 

If we replace all of these vectors with equal vectors but having as their beginning 
the fixed point O, then vector addition will proceed by the well-known “parallelo- 
gram law”; see Fig. 3.3. 


LR. Shafarevich, A.O. Remizov, Linear Algebra and Geometry, 719 
DOI 10.1007/978-3-642-30994-6_3, © Springer-Verlag Berlin Heidelberg 2013 


80 3 Vector Spaces 


Fig. 3.1 Equal vectors B 

D 

x 
A 
J 
C 

Fig. 3.2. Vector summation 

D 

J) 
C 

Fig. 3.3. The parallelogram D 


law 


There is also a definition of multiplication of a vector x by a number a. For now, 
in speaking about numbers, we shall mean real numbers (we shall have something 
to say later about the more general situation). If ~ > 0 and x is the vector AB, then 
the product ax is defined to be the vector AC lying on the same line as [A, B] in 
such a way that the point C lies on the same side of A as the point B and such 
that the segment [A, C] is a times the length of the segment [A, B]. (Note that if 
a <1, then the segment [A, C] is shorter than the segment [A, B].) Denoting by 
|AB| the length of the segment [A, B], we shall express this by way of the formula 
|AC| = a|AB|. However, if a < 0 and a = —£, where then 6 > 0, then the product 
ax is defined to be the vector CA, where Bx = AC. 

We shall not derive the simple properties of vector addition and multiplication of 
a vector by a number. We observe only that they are amazingly similar for vectors on 
a line, in the plane, and in space. This similarity indicates that we are dealing only 
with a special case of a general concept. In this and several subsequent chapters, 
we shall present the theory of vectors and the spaces consisting of them of arbi- 
trary dimension n (including even some facts relating to spaces whose dimension is 
infinite). 

How do we formulate such a definition? In the case of vectors on a line, in the 
plane, and in space, we shall use the intuitively clear concept of directed line seg- 


3.1 The Definition of a Vector Space 81 


ment. But what if we are not convinced that our interlocutor shares the same intu- 
ition? For example, suppose we wanted to share our knowledge with an extraterres- 
trial with whom we are communicating by radio? 

A technique was long ago devised for overcoming such difficulties in the sci- 
ences. It involves defining (or in our terminology, reporting to the extraterrestrial) 
not what are the objects under consideration (vectors, etc.), but the relationships be- 
tween them, or in other words, their properties. For example, in geometry, one leaves 
undefined such notions as point, line, and the property of a line passing through a 
point, and instead formulates some of their properties, for instance that between two 
distinct points there passes one and only one line. Such a method of defining new 
concepts is called axiomatic. In this course on linear algebra, the vector space will 
be the first object to be defined axiomatically. Till now, new concepts have been 
defined using constructions or formulas, such as the definition of the determinant 
of a matrix (defined either inductively, using the rule of expansion by columns, or 
derived using the rather complicated explicit formula (2.44) from Sect. 2.7). It is, 
however, possible that the reader has encountered the concepts of groups and fields, 
which are also defined axiomatically, but may not have investigated them in detail, 
in contrast to the notion of a vector space, the study of which will occupy this entire 
chapter. 

With that, we move on to the definition of a vector space. 


Definition 3.1 A vector (or linear) space is a set L (whose elements we shall call 
vectors and denote by x, y, Z, etc.) for which the following conditions are satisfied: 


(1) There is a rule for associating with any two vectors x and y a third vector, called 
their sum and denoted by x + y. 

(2) There is a rule for associating with any vector x and any number a a new vector, 
called the product of a and x and denoted by ax. (The numbers a by which a 
vector can be multiplied, be they real, complex, or from any field K, are called 
scalars.) 


These operations must satisfy the following conditions: 


(a) x+y=ytx. 

(b) ®W+y)+z=x+(y+2z). 

(c) There exists a vector 0 € L such that for an arbitrary vector x € L, the sum x + 0 
is equal to x (the vector 0 is called the null vector). 

(d) For each vector x € L, there exists a vector —x € L such that x + (—x) = 0 (the 
vectors x and —x are called additive inverses or opposites of each other). 

(e) For an arbitrary scalar w and vectors x and y, 


a(x+y)=ax+ay. 


‘Readers who are familiar with the concept of a group will be able to reformulate conditions (a)- 
(c) in a compact way by saying that with respect to the operation of vector addition, the vectors 
form an abelian group. 


82 3 Vector Spaces 
(f) For arbitrary scalars aw and f and vector x, 
(a+ B)x =ax + Bx. 


(g) Similarly, 
a (Bx) = (aB)x. 


(h) For an arbitrary vector x, 
Ix=x and 0Ox=0. 


In the last equality, the 0 on the right-hand side denotes the null vector of the space 
L, while the 0 on the left is the scalar zero (these will always be so denoted using 
lighter and heavier type). 


It is easy to prove that there is a unique null vector in L. Indeed, if there were 
another null vector 0’, then by definition, we would have the equality 0’ = 0'+ 0 =0, 
from which it follows that 0’ = 0. 

Using properties (a) through (d) and the uniqueness of the null vector, it is easily 
proved that for an arbitrary x, there is a unique additive inverse vector —x in L. 

It follows from properties (g) and (h) that the vector —x is obtained by multiply- 
ing the vector x by the scalar —1. Indeed, since 


x+(-Dx = 1x + (-Dx = (1+ (-D)x =0x =0, 


we obtain by the uniqueness of the additive inverse that (—1)x = —x. Analogously, 
from properties (f) and (h), it follows that for every vector x and natural number k, 
the vector kx is equal to the k-fold sum x +---+¥x. 


Remark 3.2 (On scalars and fields) We would like to make more precise what we 
mean by scalars a, 6, etc. in the definition of vector space above. The majority of 
readers will probably assume that we are talking about real numbers. In this case, L 
is called a real vector space. But those who are familiar with complex numbers may 
choose to understand the scalars a, 6, etc., as complex. In that case, L will be called 
a complex vector space. The theory developed below will be applicable in this case 
as well. Finally, the reader familiar with the concept of field may combine these two 
cases, understanding the scalars involved in the definition of a vector space to be 
elements of any field K. Then L will be called a vector space over the field K. 
Strictly speaking, this question of scalars could have been addressed in the pre- 
ceding chapters in which we discussed numbers without going into much detail. The 
answer would have been the same: by scalars, one may understand real numbers, 
complex numbers, or the elements of any field. All of our arguments apply equally 
to all three cases. The only exception is the proof of Property 2.10 from Sect. 2.2, in 
which we used the fact that from the equality 2D = 0 it followed that D = 0. A field 


3.1 The Definition of a Vector Space 83 


in which that assertion is true for every element D is called a field of characteristic” 
different from 2. Nonetheless, it is possible to prove that Property 2.10 holds in the 
general case as well. 


Example 3.3 We present here a few examples of vector spaces. 


(a) The set of vectors on a line, in the plane, or in space as we have previously 
discussed. 

(b) In Sect. 2.9, we introduced the notions of addition of matrices and multiplication 
of a matrix by a number. It is easily verified that the set of matrices of a given 
type (m,n) with operations thus defined is a vector space. That conditions (a) 
through (h) are satisfied reduces to the corresponding properties of numbers. In 
particular, the set of rows (or columns) of a given length n is a vector space. 
We shall denote this space by K” if the row (or column) elements belong to the 
field IK. Here it is understood that if we are operating with real numbers only, 
then IK = R, and the field will then be denoted by R”. If we are using complex 
numbers, then K = C, and the vector space will be denoted by C”. The reader 
may choose any of these designations. 

(c) Let L be the set of all continuous functions defined on a given interval [a, b] 
taking real or complex values. We define addition of such functions and multi- 
plication by a scalar in the usual way. It is then clear that L is a vector space. 

(d) Let L be the set of all polynomials (of arbitrary degree) with real or complex 
coefficients or coefficients in a field K. Addition and multiplication by a scalar 
are defined as usual. Then it is obvious that L is a vector space. 

(e) Let L be the collection of all polynomials whose degree does not exceed a fixed 
number 7. Everything else is the same as in the previous example. We again 
obtain a vector space (one for each value of 7). 


Definition 3.4 A subset L’ of a vector space L is called a subspace of L if for arbi- 
trary vectors x, y € L’, their sum x + y is also in L’, and for an arbitrary scalar a 
and vector x € L’, the vector wx is in L’. 

It is obvious that L’ is itself a vector space. 


Example 3.5 The space L is a subspace of itself. 


Example 3.6 The vector 0 by itself forms a subspace. It is called the zero space and 
is denoted by (0).* 


?For readers familiar with the definition of a field, we can give a general definition: The character- 
istic of a field IK is the smallest natural number k such that the k-fold sum kD = D+.---+ D is 
equal to 0 for every element D € K (as is easily seen, this number k is the same for all D 4 0). If 
no such natural number k exists (as in, for example, the most frequently encountered fields, K = R 
and K = C), then the characteristic is defined to be zero. 


3Translator’s note: It may be tempting to consider “null space” a possible synonym for the zero 
space. However, that term is reserved as a synonym for “kernel,” to be introduced below, in Defi- 
nition 3.67. 


84 3 Vector Spaces 


Example 3.7 Consider the space encountered in analytic geometry consisting of all 
vectors having their beginning at a certain fixed point O. Then an arbitrary line 
and an arbitrary plane passing through the point O will be subspaces of the entire 
enclosing vector space. 


Example 3.8 Consider a system of homogeneous linear equations in n unknowns 
with coefficients in the field K. Then the set of rows forming the solution set is a 
subspace L’ of the space IK” of rows of length n. This follows from the notation 
(1.10) of such a system (with b; = 0) and properties (1.8) and (1.9) of linear func- 
tions. The subspace L’ is called the solution subspace of the associated system of 
homogeneous linear equations. The equations of the system determine the subspace 
L’ just as the equation of a line or plane does in analytic geometry. 


Example 3.9 In the space of all polynomials, the collection of all polynomials with 
degree at most n (for any fixed number 7) is a subspace. 


Definition 3.10 A space L is called the sum of a collection of its subspaces 
Li,Lo,..., Lx if every vector x € L can be written in the form 


X=xX,;+xo+-:-+x,, Wwherex; €L,. (3.1) 


In that case, we write 


LoLi+lot+---+Le. 


Definition 3.11 A space L is called the direct sum of its subspaces L;, L2,..., Lz if 
it is the sum of these subspaces and in addition, for every vector x € L, the repre- 
sentation (3.1) is unique. In this case, we write 


L=L;@Uo@::-@lx. (3.2) 


Example 3.12 The space that we considered in Example 3.7 is the sum of two planes 
if they do not coincide; it is the sum of a line and plane if the line is not contained 
in the given plane; it is the sum of three lines if they do not belong to a common 
plane. In the second and third cases, the sum will be a direct sum. In the case of 
two planes, it is easily seen that the representation (3.1) is not unique. For example, 
we can represent the null vector as a sum of two vectors that are additive inverses 
of each other lying on the line that is obtained as the intersection of the two given 
planes. 


Example 3.13 Let us denote by L; the vector space consisting of all monomials of 
degree i. Then the space L of polynomials of degree at most n can be represented as 
the direct sum L=Lp ®L; ®--- P Ly. This follows from the fact that an arbitrary 
polynomial is uniquely determined by its coefficients. 


3.1 The Definition of a Vector Space 85 


Lemma 3.14 Suppose the vector space L is the sum of certain of its subspaces 
Li,Lo,..., Lx. Then in order for L to be a direct sum of these subspaces, it is neces- 
sary and sufficient that the relationship 


Xp txt ---+x,=0, x; EL, (3.3) 
hold only if all the x; are equal to 0. 


Proof The necessity of condition (3.3) is clear, since for the vector 0 € L, the equal- 
ity 0=0+.---+0, in which the null vector of the subspace L; stands in the ith 
place, is a representation of type (3.1), and the presence of another equality of the 
form (3.3) would contradict the definition of direct sum. To prove the sufficiency of 
the condition (3.3), if there are two representations (3.1), 


XS=XLXI++++ + Xk, X=Yi+yYoto + Ye, 


then it suffices to subtract one from the other and again use the definition of direct 
sum. 


We observe that if L},Lo,...,L, are subspaces of a vector space L, then their 
intersection Ly 1L2M--- ML, is also a subspace of L, since it satisfies all the re- 
quirements in the definition of subspace. In the case k = 2, then Lemma 3.14 allows 
us to obtain in the following corollary another, more graphic, criterion for the sum 
of subspaces to be a direct sum. 


Corollary 3.15 Suppose the vector space L is the sum of two of its subspaces Ly 
and 12. Then in order that L be a direct sum, it is necessary and sufficient that one 
have the equality Ly A L2 = (0). 


Proof By Lemma 3.14, L is the direct sum of its subspaces L; and L2 if and only if 
the equation x; + x2 =0, where x; € L; and x2 € Ly, is satisfied only if x; = 0 and 
x2 = 0. But from x; + x2 = 0, it follows that the vector x; = —x2 is contained in 
both subspaces L; and L2, whence it follows that it is contained in the intersection 
L1 ML». Therefore, the condition L = L; @ L2 is equivalent to the satisfaction of the 
two conditions L=L, +L, and Lj NL. = (0), which completes the proof. 


We observe that the last assertion cannot be generalized to an arbitrary number 
of subspaces L;,..., Lx. For example, suppose that L is the plane consisting of all 
vectors with origin at O, and suppose that L;, Lz, L3 are three distinct lines in this 
plane passing through O. It is clear that the intersection of any two of these lines 
consists of only the zero vector, and so a fortiori, L} 1 Lz NL3 = (0). The plane L 
is the sum of its subspaces L;, Lz, L3, but it is not the direct sum, since it is obvious 
that one can produce the equality x; + x2 + x3 = 0 for nonnull vectors x; € Lj. 

It is easy to see that if equality (3.2) is satisfied, then there exists a bijection 
between the set of vectors x € L and the set Ly x --- x Lx, the product of the sets 
Li,...,Lx (see the definition on page xvi). This observation provides a method for 


86 3 Vector Spaces 


constructing the direct sum of vector spaces that are not, so to speak, originally 
subspaces of a larger enclosing space and even have perhaps completely different 
structures from one another. 

Let L;,...,L,% be vector spaces. Just as for any other sets, we can define their 
product L=L, x --- x Lx, which in this case is not yet a vector space. However, it is 
easy to make it into one by defining the sum and the product by a scalar according 
to the following formulas: 


(X1,- 62 KEV V ee Ve) = HLA IV MEA VE), 


O(X1,.--, Xe) = (@X]1,..., 0X4), 


for all vectors x; €L;, y; €L;,i=1,...,k, and an arbitrary scalar a. 

A simple verification shows that in this way, the definition of the operation satis- 
fies all the conditions for the definition of a vector space, and the setL = L, x--- x Lx 
becomes a vector space containing L;, ..., Lx among its subspaces. If we wish to be 
technically precise, then the subspaces of L are not the L; themselves, but the sets 
L = (0) x --- x L; x --- x (0), where L; stands in the ith place, with the zero space 
at all the remaining places other than L;. However, we shall close our eyes to this 
circumstance, identifying L with L; itself.* It is clear, then, that condition (3.2) is 
satisfied. Thus, for arbitrary mutually independent vector spaces L),..., Lg it is al- 
ways possible to construct a space L containing all the L; as subspaces that is their 
direct sum; that is, L=L; ®--- @®Ly. 


Example 3.16 Let Li be the vector space considered in Example 3.7, that is, the 
physical space that surrounds us, and let Lz = R be the real line, considered as the 
time axis. Operating as described above, we can define the direct sum L=L; @ Lz. 

The vectors of the space L thus constructed are called space-time events and have 
the form (x,t), where x € L; is the space component, and t € L2 is the time com- 
ponent. For the addition of such vectors, the space components are added among 
themselves (as vectors in physical space, for example, according to the parallelo- 
gram law), while the time components are added to one another (as real numbers). 
Multiplication by a scalar is defined analogously. This space plays an important 
role in physics, in particular in the theory of relativity, where it is called Minkowski 
space. We remark that we still need to introduce some additional structure, namely 
a particular quadratic form. We shall return to this question in Sect. 7.7 (see p. 268). 


3.2 Dimension and Basis 


In this section we shall use the notion of linear combination, which in the case of 
a space of rows (or row space) of length n has already been introduced (see the 


4More precisely, this identification is achieved with the help of the concept of isomorphism of 
vector spaces, which will be introduced below, in Sect. 3.5. 


3.2 Dimension and Basis 87 


definition on p. 57). We shall now repeat that definition practically verbatim. In 
preparation, we observe that applying repeatedly the operations of vector addition 
and multiplication of a vector by a scalar, we can form more complex expressions, 
such as ax 1 + a@2X%2 +--+ + QmXm, which, moreover, according to properties (a) 
and (b) of the definition of vector space, do not depend on the order of terms or the 
arrangement of parentheses (which is necessary in order that we be able to combine 
not only two vectors, but m of them). 


Definition 3.17 In the vector space L, let x}, x2,...,X)m be m vectors. A vector y 
is called a linear combination of these m vectors if 


YH AX, +A2K2 +--+ + AmXm, (3.4) 


for some scalars a1, @2,...,Qm. 


The collection of all vectors that are linear combinations of some given vectors 
X1,X2,...,Xm, that is, those having the form (3.4) for all possible a1, a2,..., Om, 
clearly satisfies the definition of a subspace. This subspace is called the linear span 
of the vectors ¥|,%2,...,Xm and is denoted by (x1, x2,...,Xm). It is clear that 


(X1,X2,...,Xm) = (X1) + (¥2) +--+ + (Xm). (3.5) 


Definition 3.18 Vectors x1, x2,...,Xm are called linearly dependent if there exists 
a linear combination (3.4) equal to 0 not all of whose coefficients a1, a2, ..., @m are 
equal to zero. Otherwise, ¥1,%2,...,Xm are said to be linearly independent. 


Thus vectors ¥1,%2,...,Xm are linearly dependent if for some scalars a, a2, 
...,Qm, One has 


OX, +a2X2 +++: +AmXm = 90, (3.6) 


with at least one a; not equal to 0. For example, the vectors x; and x2 = —x, are 
linearly dependent. Conversely, the vectors ¥1,%2,...,Xm are linearly independent 
if (3.6) holds only for a] = a2 =--- =a, = 0. In this case, the sum (3.5) is a direct 
sum, that is, 


(X1,X2,...,Xm) = (X1) ® (X2) B+: B (Xm). 


Here is a useful reformulation: Vectors x1, %2,...,Xm are linearly dependent if 
and only if one of them is a linear combination of the others. Indeed, if 


Xj =X, +++ + Qj-1X;j 1+ Oj41Xj41 +++ + AnXm, (3.7) 


then we have the relationship (3.6) with a; = —1. Conversely, if in (3.6), the coeffi- 
cient a; is not equal to 0, then if we transfer the term a,x; to the right-hand side and 
multiply both sides of the equality by the scalar —a; we obtain a representation 
of x; as a linear combination x1, ...,¥j—1,Xj41,---,Xm- 

We are finally in a position to formulate the main definition of this section (and 


perhaps of the entire chapter). 


88 3 Vector Spaces 


Definition 3.19 The dimension of a vector space L is the largest number of linearly 
independent vectors in the space, if such a number exists. The dimension of a vector 
space is denoted by dim L, and if the greatest number of linearly independent vectors 
is finite, the space L is said to be finite-dimensional. If there is no maximum number 
of linearly independent vectors in L, then the space is said to be infinite-dimensional. 
The dimension of the vector space (0) is by definition equal to zero. 


Thus the dimension of a vector space is equal to the natural number n if the 
space contains n linearly independent vectors and every set of m vectors for m >n 
is linearly dependent. A vector space is infinite-dimensional if there is a collection 
of n linearly independent vectors for every natural number n. Employing standard 
terminology, we shall call a space of dimension | a line and a space of dimension 2 
a plane. 


Example 3.20 It is well known from elementary geometry (or from a course in 
analytic geometry) that vectors on a line, in the plane, or in the physical space that 
surrounds us form vector spaces of dimension 1, 2, and 3. This is the principal 
intuitive basis of the general definition of dimensionality. 


Example 3.21 The space of all polynomials in the variable ¢ is clearly infinite- 
dimensional, since for an arbitrary number n, the polynomials 1, f, ?. ..., 07! are 
linearly independent. The space of all continuous functions on the interval [a, b] is 
a fortiori infinite-dimensional. 


The dimension of a vector space L depends not only on the set itself whose ele- 
ments are the vectors of L, but also on the field over which it is defined. This will be 
made clear in the following examples. 


Example 3.22 Let L; be the space whose vectors are the complex numbers, defined 
over the field C. The operations of vector addition and multiplication by a scalar will 
be defined as the usual operations of addition and multiplication of complex num- 
bers. Then it is easily seen from the definition that dimL; = 1. If we now consider 
the vector space L» likewise consisting of the complex numbers, but defined over the 
field R, then we obtain dim Lz = 2. This, as we shall see, follows from the fact that 
every complex number is uniquely defined by a pair of real numbers (its real and 
imaginary parts). The frequently encountered expression “complex plane” implies 
the two-dimensional space Lz over the field IR, while the expression “complex line” 
indicates the one-dimensional space L, over the field C. 


Example 3.23 Let L be the vector space consisting of the real numbers, but defined 
over the field Q of rational numbers (it is easy to see that all the conditions for the 
definition of a vector space are satisfied). In this case, in a linear combination (3.4), 
vectors x; and y are real numbers, while q; is a rational number. By properties of 
sets of numbers proved in a course in real analysis, it follows that the space L is 
infinite-dimensional. Indeed, if the dimension of L were some finite number n, then 


3.2 Dimension and Basis 89 


as we shall prove below, it would imply that there exist numbers x;,...,x, € R 
such that an arbitrary y € R could be written as a linear combination (3.4) with 
suitable coefficients @1,...,@, from the field Q. But that would imply that the set 
of real numbers is countable, which, as is known from real analysis, is not the case. 


It is obvious that the dimension of a subspace L’ of a vector space L cannot be 
greater than the dimension of the entire space L. 


Theorem 3.24 [f the dimension of a subspace L’' of a vector space L is equal to the 
dimension of L, then the subspace L’ is equal to all of L. 


Proof Suppose dim L’ = dimL =n. Then in L’ one could find n linearly independent 
vectors X1,...,%,. If L’. #L, then in L there would be some vector x ¢ L’. Since 
dimL = n, it follows that any n + 1 vectors in this space are linearly dependent. 
In particular, the vectors x¥1,...,%,,x are linearly dependent. That is, there is a 
relationship 


QyXp+:++ + A,X, tax =0 


with not all coefficients equal to zero. If we had a = 0, then this would yield the 


linear dependence of the vectors x1, ...,%X,, which are linearly independent by as- 
sumption. This means that a 4 0 and x = Bix; +---+ BrXn, Bi = —a!a;, from 
which it follows that x is a linear combination of the vectors x1,...,X,. It clearly 


follows from the definition of a subspace that a linear combination of vectors in L’ 
is itself a vector in L’. Hence we have x € L’, and L’ =L. 


If the dimension of a vector space L is finite, dimL =n, and a subspace L’ CL 
has dimension n — 1, then L’ is called a hyperplane in L. 

There is a defect in the definition of dimension given above: it is not effective. 
Theoretically, in order to determine the dimension of a vector space, it would be 
necessary to look at all systems of vectors ¥1,...,%m for various m in the space 
and determine whether each is linearly independent. With such a method, it is not 
so simple to determine the dimension of the row space of length n or of the space 
of polynomials of degree less than or equal to n. Therefore, we shall investigate the 
notion of dimension in greater detail. 


Definition 3.25 Vectors e;,...,@, of a vector space L are called a basis if they 
are linearly independent and every vector in the space L can be written as a linear 


combination of these vectors. 


Thus if e1,...,@, is a basis of the space L, then for an arbitrary vector x € L 
there exists an expression of the form 


x=ajej +a2e2 +--+: + nen. (3.8) 


Theorem 3.26 For an arbitrary vector x, the expression (3.8) is unique. 


90 3 Vector Spaces 


Proof This is a direct consequence of the fact that the vectors e;,...,e, form a 
basis. Let us assume that there are two expressions 


X= ae) +a2€2 +--- + Anen, x = Bie; + Boer +--+ + Buen. 
Subtracting one equality from the other, we obtain 


(a, — Bi)e) + (a2 — B2r)e2 +---+ (Qn — Buen = 9. 


But since the vectors e1,...,é@, form a basis, then by definition, they are linearly 
independent. From this it follows that a; = 6), a2 = Bo, ..., &, = By, as was to be 
proved. 

Corollary 3.27 If e),...,@n is a basis of the vector space L, then L can be written 
in the form 


L= (e1) ® (e2) B+ @ (en). 


Definition 3.28 The numbers a1, ...,@, in the expression (3.8) are called the co- 
ordinates of the vector x with respect to the basis e,,...,@n (or coordinates in that 
basis). 


Example 3.29 An arbitrary vector e ~ 0 on a line (that is, a one-dimensional vector 
space) forms a basis of the line. For an arbitrary vector x on the same line, we 
have the expression (3.8), which in the given case takes the form x = we with some 
scalar a. This @ is the coordinate (in this case the only one) of the vector x in the 
basis e. If e’ 40 is another vector on the same line, then it provides another basis. 
We have seen that e’ = ce for some scalar c £ 0 (since e’ 4 0). Therefore, from the 
relationship x = we we obtain that x = ace’. Thus in the basis e’, the coordinate 


of the vector x is equal to ac7!. 


Thus we have seen that the coordinates of a vector x depend not only on the vec- 
tor itself, but on the basis that we use (in the general case, e1,..., @€, ). Consequently, 
the coordinates of a vector are not an “intrinsic geometric” property. The situation 
here is similar to the measurement of physical quantities: the length of a line seg- 
ment or the mass of a body. Neither the one nor the other can be characterized by a 
number. It is necessary as well to have a unit of measurement: in the first case, the 
meter, centimeter, etc.; in the second, the kilogram, gram, etc. We shall encounter 
such a phenomenon repeatedly: some object (such as, for example, a vector) cannot 
be defined “in and of itself” by some set or other of numbers; rather, something 
similar to a unit of measurement (in our case, a basis) must be chosen. Here, there 
are always two possible points of view: either to choose some method of associat- 
ing numbers with the object or to limit oneself to the study of its “purely intrinsic” 
properties, independent of the method of association. For example, in physics, we 
are interested in physical quantities themselves, but the laws of nature are usually 
expressed in the form of mathematical relationships among the numbers that char- 
acterize them. We will try to reconcile both points of view after defining how the 


3.2 Dimension and Basis 91 


numbers that characterize the object change under different methods of associating 
numbers with the object. In particular, in Sect. 3.4, we shall consider the question 
of how the coordinates of a vector change under a change of basis. 

In terms of the coordinates of vectors (relative to an arbitrary basis e1,..., en), 
it is easy to express the operations that enter into the definition of a vector space, 
namely the addition of vectors and the multiplication of a vector by a scalar. Namely, 
if x and y are two vectors, and 


X=ayey +--+ + Ann, y= Bier t+---+ Bren, 
then 


X+y = (aye +---+anen) + (Brey +-+-+ Buen) 
= (a) + Bier +--+ (Qn + Brien, (3.9) 


and for an arbitrary scalar a, 
ax =a(aje; +--+» +Qn€n) = (aay )e} +--+ + (aan )en, (3.10) 


so that the coordinates of vectors under addition are added, and under multiplication 
by a scalar, they are multiplied by that scalar. 

It follows from the definition of a basis that if dimL =n and e1,..., ey is any set 
of n linearly independent vectors in L, then they form a basis of L. Indeed, it suffices 
to verify that an arbitrary vector x € L can be written as a linear combination of 
these vectors. But from the definition of dimension, n + 1 vectors x,e),...,@, are 
linearly dependent, that is, 


Bx +aje; +a2e2 +---+a7e, =0 


for some scalars 6, a1, @2,...,Q@n. In this case, 8 ~ 0, for otherwise, this would 
contradict the linear independence of the vectors forming the basis. But then 


—l -l —1 
x=—f aje;— B agen —---— B anen, 


which was to be proved. 

From the definition, it follows that if the dimension of a vector space L is equal 
to n, then there exist n linearly independent vectors in L, which by what we have 
proved, form a basis. Now we shall establish a more general fact. 


Theorem 3.30 [fe,,...,@m are linearly independent vectors in a vector space L of 
finite dimension n, then this set of vectors can be extended to a basis of L, that is, 
there exist vectors e;,m <i <n, such that e),...,@m,@m+1,---,€n is a basis of L. 


Proof If the vectors e},...,@m already form a basis, then m =n, and the theo- 
rem is proved. If they do not form a basis, then clearly m <n, and there exists a 
vector @m+1 in L that is not a linear combination of e1,...,@m. Thus the vectors 


92 3 Vector Spaces 


€1,---,@m4 1 are linearly independent. Indeed, if they were linearly dependent, we 
would have the relationship 


yey +++ + Amem + Am+1emt+1 =0, (3.11) 

in which not all the a1, ..., &@m+1 were equal to zero. Now we must have a,+41 4 0, 

since otherwise we would have to infer that the vectors e;,..., @m were linearly de- 

pendent. But then from (3.11) we obtain that e,,,; = Bye; +--- + Bnem, where 

bi = <a, Oi, that is, the vector e441 is a linear combination of the vectors 
€1,..-,@m, contradicting our assumption. 

The same reasoning can be applied to the system of vectors e1, ..., @m+1. Con- 


tinuing in this way, we will obtain a system containing an ever increasing number of 
linearly independent vectors, and sooner or later, we will have to stop the process, 
since the dimension of the space L is finite. But then every vector of the space L will 
be a linear combination of the linearly independent vectors of our enlarged system. 
That is, we will have produced a basis. 


In the situation under consideration in Theorem 3.30, we shall say that the sys- 
tem of vectors @1,...,@m has been augmented to the basis e€1,...,@n. AS an easy 
verification shows, this is equivalent to the relationship 


(€1,.--,€n) = (€1,---,€m) B (C@m41,--+5 en): (3.12) 


Corollary 3.31 For an arbitrary subspace U C L of the finite-dimensional vector 
space L, there exists a subspace L’ CL such thatL=L' @L". 


Proof It suffices to take any basis e1,..., @m of L’, augment it to a basis e1,..., en 
of the space L, and set L= (e1,...,@n), LU’ = (e1,..-,@m), and L” = (em41,.--, €n) 
in (3.12). 


We shall now prove an assertion that is the central point of the entire theory. 
Therefore, we shall present two proofs (although they are, in fact, based on the 
same principle). 


Lemma 3.32 More than n linear combinations of n vectors in an arbitrary vector 
space are of necessity linearly dependent. 


Proof First proof. Let us write down explicitly just what has to be proved. Suppose 
we are given n vectors X1,...,X, and m linear combinations of them yj,..., Ym> 
where m > n. Then we have the relationships 


VY, =41X1 + 412X2 +--+ + a1nXn, 


V2 = 421X1 + 22X72 +++ + An Xn, (3.13) 


Ym = 41X11 + Am2X2 +°++ + GmnXn 


3.2 Dimension and Basis 93 


for certain scalars a;;. We now have to find scalars a1, ..., @m, not all of them equal 
to zero, such that 


Substituting here (3.13) and collecting like terms, we obtain 


(ayy +242) + +++ + in Am1)X1 + (01 a12 + 2A22 + + +> + OmAm2)X2 


eet (QA n + 0242 ++ +++ OAmAmn)Xn = 9. 


This equality will be satisfied if all the coefficients of the vectors x1,...,Xn, are 
equal to zero, that is, if the equations 


a1) +.a2102 + +++ + mim = 0, 
a2) + a2202 + +++ + Gm2m = 0, 


A n@] + Ayn, + +++ + Amnm = 0, 


are satisfied. Since m > n by assumption, we have n homogeneous equations in 
more than n unknowns, namely a1, ...,@m. By Corollary 1.11, this system has a 
nontrivial solution a1, ..., @m, Which gives the assertion of the lemma. 

Second proof. This proof will be by induction on n and based on formula (3.13). 
The base case n = | of the induction is obvious: any m vectors proportional to the 
given vector x, will be linearly dependent if m > 1. 

Now let us consider the case of arbitrary n > 1. In formula (3.13), suppose that 
the coefficient aj; is not equal to 0. We may make this assumption with no loss 
of generality. Indeed, if in formula (3.13), all coefficients satisfy a;; = 0, then all 


the vectors y,,..., y,, are equal to 0, and the theorem is true (trivially). But if 
at least one coefficient aj; is not equal to 0, then by changing the numeration of 
the vectors ¥1,...,X, and yj,..., y,,, we can move this coefficient to the upper 


left-hand corner and assume that a;,; 4 0. Let us now subtract from the vectors 
Y2,--+,¥m the vector y, with a coefficient such that in the relationships (3.13), 
the vector x; is eliminated. After this, we obtain the vectors yy — y21,---, Ym — 
Vm 1, where y2 = ayaa loess Ym = a: ami. These m — 1 vectors are already linear 
combinations of the n — 1 vectors x2, ...,X,. Since we are using induction on n, we 
may assume the lemma to be true in this case. This means that there exist numbers 
2,...,@», not all zero, such that a2(y — y2y1) t--> + Am (Vin — Yn Y1) = 9, that 
is, 


—(Y202 +++ + Ym &m)Y1 + O22 +++ + Om Ym = 9, 


which means that the vectors y,,..., y,, are linearly dependent. 


It was apparent that in the second proof, we used the method of Gaussian elimi- 
nation, which was used to prove Theorem 1.10, which served as a basis of the first 
proof. Thus both proofs are based on the same idea. 


94 3 Vector Spaces 


The connection between the notions of basis and dimension is made apparent in 
the following result. 


Theorem 3.33 [fa vector space L has a basis of n vectors, then its dimension is n. 


Proof The proof of the theorem follows easily from the lemma. Let e;,...,e, bea 
basis of the space L. We shall show that dim L = n. In this space, there are n linearly 
independent vectors, for instance, the vectors e1,...,@, themselves. And since an 
arbitrary vector of L is a linear combination of the vectors of a basis, then by the 
lemma, there cannot exist a greater number of linearly independent vectors. 


Corollary 3.34 Theorem 3.33 shows that every basis of a (finite-dimensional) vec- 
tor space consists of the same number of vectors equal to the dimension of the space. 
Therefore to determine the dimension of a vector space, it suffices to find any basis 
in that space. 


As arule, this is a relatively easy task. For example, it is clear that in the space of 
polynomials (in the variable t) of degree at most n, there is a basis consisting of the 
polynomials 1, t,t7,...,¢”. This implies that the dimension of the space is n + 1. 


Example 3.35 Consider the vector space K” of rows of length n consisting of ele- 
ments of an arbitrary field K. In this space, there is a basis consisting of the rows 


e; =(1,0,0,..., 0), 


e2 = (0, 1,0,...,0), (3.14) 


én = (0,0,0,..., 1). 


In Sect. 1.1, we verified in the proof of Theorem 1.3 that every row of length n is a 
linear combination of these n rows. The same reasoning shows that these rows are 
linearly independent. Indeed, suppose that a,e; + ---+ a), = 90. As we have seen, 
aye; +---+ ae, is equal to (a1, ...,@,). This means that a} = --- =a, = 0. Thus 
the dimension of the space K” is n. 


Example 3.36 Let M be an arbitrary set. Let us denote by F(M) the collection of all 
functions on M taking values in some field (the real numbers, complex numbers, or 
an arbitrary field IK). The set F(M) becomes a vector space if for f; €¢ F(M) and 
Jf € F(M), we define the sum and multiplication by a scalar w using the formulas 


(fit fo) = fi) + fo), (a f)(x) =a f(x) 


for arbitrary x € M. 
Suppose that the set M is finite. Let us denote by 4, (y) the function that is equal 
to 1 for y =x and is 0 for all y 4 x. Functions 6,(y) are called delta functions. 


3.2 Dimension and Basis 95 


We shall show that they constitute a basis of the set F(M). Indeed, for any function 
f € F(M) we have the obvious equality 


f(y) = Yo f@6x(9), (3.15) 


xeM 


from which it follows that an arbitrary function in the space F(M) can be expressed 
as a linear combination of the 6,, x € M. It is clear that the set of all delta functions 
is linearly independent, that is, they form a basis of the vector space F(M). Since 
the number of functions in this collection is equal to the number of elements of the 
set M, the set F(M) is finite-dimensional, and dim F(M) is equal to the number 
of elements in M. In the case that M = N,, (see the definition on p. xi), then any 
function f € F(N,) is uniquely determined by its values f(1),..., f(m), which 
are its coordinates in the decomposition (3.15) with respect to the basis 6,, x € M. 
If we set a; = f (i), then the numbers (a, ..., a) form a row, and this shows that 
the vector space F'(N,,) coincides with the space K”. In particular, the basis of the 
space F(N,,) consisting of the delta functions coincides with the basis (3.14) of the 
space K”. 


In many cases, Theorem 3.33 provides a simple method for finding the dimension 
of a vector space. 


Theorem 3.37 The dimension of a vector space (X,...,Xm) is equal to the maxi- 
mal number of linearly independent vectors among the vectors X|,...,Xm-. 


Therefore, even though the definition of dimension requires the consideration of 


all the vectors in the space (x1,...,%m), Theorem 3.37 makes it possible to limit 
consideration to only the vectors x1, ...,Xm. 

Proof of Theorem 3.37 Let us set L’ = (x1,...,%Xm) and define by 7 the maximum 
number of linearly independent vectors among x1,..., Xm. Changing the numera- 
tion if necessary, we may suppose that the first / vectors x1,..., x, are linearly in- 
dependent. Let L’ = (x1,...,x7). It is clear that x;,..., x); form a basis of the space 


L”, and by Theorem 3.33, dimL” =/. We shall prove that L” = L’, which will give us 
the result of Theorem 3.37. If / = m, then this is obvious. Suppose, then, that / < m. 
Then by our assumption, for any k =/+1,...,m, the vectors x¥1,...,X 7, x, are lin- 
early dependent, that is, there is a linear combination a,x; +----+ ajx; +azx, =0 
in which not all a; are equal to zero. And furthermore, it is necessary that a, 4 0, 
since otherwise, we would obtain the linear dependence of the vectors x1,..., 7, 
which contradicts the hypothesis. Then 


Xp= —a;, ax — a, 'a2x2 ee a, ox), 


that is, the vector x; is in L”. We have shown this for all k > 7, but by construction, it 
is also true for k < /. This means that all vectors x; are in the space L’, and hence so 
are all linear combinations of them. Therefore, not only do we have L” C L’ (which 
is obvious by construction), but L’ c L’, which shows that L” = L’, as desired. 


96 3 Vector Spaces 


Theorem 3.38 [fL, and Ly are two finite-dimensional vector spaces, then 
dim(L; ® L2) = dimL; + dimLy. 


Proof Let dimL; =r, dimL2 = 5, let e1,...,e,- be a basis of the space Lj, and let 
J \,---, fs bea basis of the space Lz. We shall show that the collection of r + s vec- 
tors e1,...,e-,and f;,..., f, forms a basis of the space L} @ L2. By the definition 
of direct sum, every vector x € Lj ® L2 can be expressed in the form x = x; + x2, 
where x; € L;. But the vector x; is a linear combination of the vectors e;,..., e;, 
while the vector x2 is a linear combination of the vectors f,,..., f,. As a result, 
we obtain a representation of the vector x as a linear combination of the r + s vec- 
torse1,...,e-, f,,---, f,. The linear independence of these vectors is just as easily 
verified. Suppose there exists a relationship 


ajey+---+a,-er-+fPif;+-:-+Bhsf, =9. 


We set x; = ae; + --- + a,e, and x2 = 6, f; +--- +6; f,. Then we have the 
equality x; + x2 = 0 with x; €L;. From this, by the definition of the direct sum, 
it follows that x; = 0 and x2 = 0. From the linear independence of the vectors 


€1,...,é,, it follows that a] = 0,..., a, =0, and similarly, 6; =0,..., Bs =0. 
Corollary 3.39 For finite-dimensional spaces L1,L2,..., Lx for arbitrary k > 2, we 
have 


dim(L; ®BLo ®--- Lg) = dimL, + dimLyz + ---+ dimLx. 


Proof The assertion follows readily from Theorem 3.38 by induction on k. 


Corollary 3.40 [fL),...,L- and L are vector spaces such that L=L, +---+L,, 
and if dimL = dimL; +---+dimL,, then L=L; @---@L,. 


Proof We select a basis in each of the L; and combine them into a system of vec- 
tors @;,...,@,. By assumption, the number n of vectors in this system is equal to 
dimL, and L = (e1,..., @,). By Theorem 3.37, the vectors e1,...,@, are linearly 
independent, and this implies that L=L; @---@®L,. 


These considerations make it possible to give a more visual, geometric, char- 
acterization of the notion of linear dependence. Namely, let us prove that vectors 
X1,...,Xm are linearly dependent if and only if they are contained in a subspace L’ 
of dimension less than m. 

Indeed, let us denote by / the largest number of linearly independent vectors 


among X1,...,X,,. Let us assume that these independent vectors are x;,..., x ; and 
set L' = (x1,...,x7). Then for / = m, the vectors x1,...,%m are linearly indepen- 
dent, and our assertion follows from the definition of dimension. If / < m, then all 
the vectors x1,...,X; are contained in the subspace L’, whose dimension, by The- 


orem 3.33, is /, and the assertion is correct. 


3.2 Dimension and Basis 97 


Using the concepts introduced thus far, it is possible to prove a useful general- 
ization of Theorem 3.38. 


Theorem 3.41 For any two finite-dimensional vector spaces Li and L2, one has the 
equality 


dim(L; + Lo) = dimL; + dimLy — dim(L) NL»). (3.16) 


Theorem 3.38 is obtained as a simple corollary of Theorem 3.41. Indeed, if Ly + 
Ly =L; ® Ly, then by Corollary 3.15, the intersection Lj N L2 is equal to (0), and it 
remains only to use the fact that dim(0) = 0. 


Proof of Theorem 3.41 Let us set Lo = Ly NLy. From Corollary 3.31, it follows that 
there exist subspaces Li C L; and L5 C Ly such that 


Li =Lo OL}, Lo =Lo @L5. (3.17) 


Formula (3.16) follows easily from the equality L; + Lz =Lo ® L ® L4. Indeed, 
since Lp = L; N Ly, then in view of relationship (3.17) and Theorem 3.38, we obtain 
Lj +lo =L; OU, and therefore, 


dim(L; + Lo) = dimL; + dim L, = dimL, + dimLz — dimLo, 


which yields relationship (3.16). 

Let us prove that Lj +L: =Lo @L} ®L}. It is clear that each subspace Lo, L}, L) 
is contained in Lj +L», so that their sum Lo + Li + Ly is also contained in Ly + Lo. 
But an arbitrary vector z € Lj + L2 can be represented in the form z = x + y, where 
x €L), y € Ly, and in view of relationship (3.17), we have the representations x = 
u+vand y=uw' +w, where u,u’ €lo, v EL, w € Lj, from which we obtain 
z=x+y=(u+u’')+v+, and this means that the vector z is contained in 
Lo +L} +L). From this, it follows that 


Li tly=bo +l, +U,=l14+b). 


But Lj NL, = (0), since the vector x € L} ML) is contained both in L; NLz = Lo and 
in Lee while in view of (3.17), the intersection Lo N L, is equal to (0). As a result, 
we obtain the required equality 


Li tls =(Lo@L{) +L, = (lo OL) PLZ = OL, OL, 


which, as we have seen, proves Theorem 3.41. 


Corollary 3.42 Let Li and L2 be subspaces of a finite-dimensional vector space L. 
Then from the inequality dimL; + dimL2 > dim L, it follows that Lj NL2 ¥ (0), that 
is, the subspaces Ly and Lz have a nonzero vector in common. 


98 3 Vector Spaces 


Indeed, in this case, L} + Lz C L, which means that dim(L; + Lo) < dimL. Taking 
this into account, we obtain from (3.16) that 


dim(L; NL2) = dimL, + dimL» — dim(L; + Lz) > dimL; + dimL» — dimL > 0, 


from which it follows that L} NL2 4 (0). 

For example, two planes passing through the origin in three-dimensional space 
have a straight line in common. 

We shall now obtain an expression for the dimension of a subspace (a1, ..., @m) 
using the theory of determinants. Let a), ...,@,, be vectors in the space L, and let 
€1,...,@, be some basis of L. We shall write the coordinates of the vector a; in this 
basis as the ith row of a matrix A: 


a1 a2 -: Gin 
421 422 *** adn 
A= 
Gm1 Gm2 *** 4Gmn 
Theorem 3.43 The dimension of the vector space (a, ...,@m) is equal to the rank 
of the matrix A. 
Proof The linear dependence of the vectors a;,...,a, for k < m is equivalent to 


the linear dependence of the rows of the matrix A consisting of the same numbers. 
In Theorem 2.41 we proved that if the rank of a matrix is equal to r, then all of 
its rows are linear combinations of some collection of r of its rows. From this it 
follows already that dim(a),...,@m) <r. But in fact, from the proof of the same 
Theorem 2.41, it follows that for such a collection of r rows, one may take any r 
rows of the matrix in which there is a nonzero minor of order r (see the remark 
following Theorem 2.41). Let us show that such a collection of r rows is linearly 
independent, from which we will already have a proof of Theorem 3.43. We may 
assume that a nonzero minor M,. is located in the first r columns and first r rows 
of the matrix A. We then have to establish the linear independence of the vectors 
a\,...,a,. If we assume that aja; +---+a,a, = 0, then if we focus attention on 
only the first r coordinates of the vectors, we obtain r homogeneous linear equations 
in the unknown coefficients a@),...,a@,. It is easy to see that the determinant of the 
matrix of this system is equal to M, 4 0, and as a consequence, it has a unique solu- 
tion, which is the zero solution: wa; = 0, ..., a = 0. That is, the vectors a),..., a; 
are indeed linearly independent. 


In the past, Theorem 3.43 was formulated in the following form, which is also 
sometimes useful. Consider the vector space KK” of rows of length n (where K is the 
field of real numbers, the field of complex numbers, or an arbitrary field). Then the 
vectors a; will be rows of length n (in our case, the rows of the matrix A). From the 
proof of Theorem 3.43 we have at once the following corollary. 


3.2 Dimension and Basis 


— i. Ml, Ys, 
ALS 


Corollary 3.44 The rank of a matrix A is equal to the largest number of linearly 
independent rows of A. 


(a) 


From this, we obtain the following unexpected result. 


Corollary 3.45 The rank of a matrix A is also equal to the largest number of lin- 
early independent columns of A. 


This follows at once from the definition of the rank of a matrix and Theorem 2.32. 

To conclude this section, let us examine in greater detail the case of real vector 
spaces, and to this end, introduce some important notions that will be used in the 
sequel. 

Let L’ be a hyperplane in the finite-dimensional vector space L, that is, dimL’ = 
dimL — 1. Then this hyperplane divides L into two parts, as shown in Fig. 3.4 for 
the case of a line and a plane. 

Indeed, since L’ ¥ L, there exists a vector e € L, e ¢ L’. From this, it follows that 
L=L’ @ (e). For according to the choice of e, the intersection L’ N (e) is equal to 
(0), and by Theorem 3.38, we have the equality 


dim(L’ ® (e)) = dimL’ + 1 =dimL, 


from which we obtain, with the help of Theorem 3.24, that L’ @ (e) = L. Thus an 
arbitrary vector x € L can be uniquely expressed in the form 


x=aetu, ueU, (3.18) 


where @ is some scalar. Since the scalars in our case are real, it makes sense to talk 
about their sign. The collection of vectors x expressed as in (3.18) for which a > 0 
is denoted by L*. Likewise, the set of vectors x of the form (3.18) for which a <0 
is denoted by L~. The sets L* and L~ are called half-spaces of the space L. Clearly, 
L\U=LtTUL. 

Of course, our construction depends not only on the hyperplane L’, but also on the 
choice of the vector e ¢ L’. It is important to note that with a change in the vector e, 
the half-spaces Lt and L~ might change, but the pair (L*, L~) will remain as before; 
that is, either the spaces do not change at all, or else they exchange places. Indeed, 
let e’ ¢ L’ be some other vector. Then it can be represented in the form e’ = Ae + v, 


100 3 Vector Spaces 


where the number A is nonzero and v is in L’. This means that e = A~!(e’ — v). Then 
for an arbitrary vector x from (3.18), we obtain, as in (3.18), the representation 


x= ar! (e’ = v) +u=ar'e tu’, we, 


where u' = u — aa~'v, and we see that in passing from e to e’, the scalar a in the 
decomposition (3.18) is multiplied by 4~!. Hence the half-spaces Lt and L~ do not 
change if A > 0, and they exchange places if A < 0. 

The above definition of decomposition of a real vector space L by a hyperplane 
L’ has a natural interpretation in topological terms (see pp. xvii—xix). Readers not 
interested in this aspect of these ideas can skip the following five paragraphs. 

If we wish to use topological terminology, then we are going to have to introduce 
on L the notion of convergence of a sequence of vectors. We shall do this using the 
notion of a metric (see p. xviii). Let us choose in L an arbitrary basis e),...,@,, and 
for vectors x = aje; +---+a@,e, and y= f,e; +---+ B,e,, we define the number 
r(x, y) by means of formula 


r(x, y) = |a1 — Bil +--+ + lon — Brl- 


It easily follows from the properties of absolute value that all three conditions 
in the definition of a metric space are satisfied. Thus the vector space L and all 
of its subspaces are metric spaces with the metric r(x, y), and for a sequence 
of vectors there is automatically defined the notion of convergence: x, — x as 
k +> oo if r(xz, x) > 0 as k > oo. In other words, if x = aje; +--- + ane, and 
XE = ape; +--+ Qnxen, then the convergence x, — x is equivalent to the con- 
vergence of the 7 coordinate sequences: aj, — a; for alli = 1,...,2. We observe 
that in the definition of r(x, y), we have used the coordinates of the vectors x and y 
in some basis, and consequently, the metric obtained depends on the choice of ba- 
sis. Nevertheless, the notion of convergence does not depend on the choice of basis 
€1,..-,@n. This follows easily from the formulas (3.35) relating the coordinates of 
a vector in various bases, which will be presented later. 

The meaning of a partition L \ L’ =L* UL™ consists in the fact that the metric 
space L \ L’ is not path-connected, while Lt and L~ are its path-connected compo- 
nents. 

Indeed, let us suppose that in the metric space L \ L’, there exists a deformation 
of the vector x to y, that is, a continuous mapping f : [0,1] > L \ L’ such that 
f (0) =x and f(1) = y. Then by formula (3.18), we have the representation 


x=ae+u, y=fe+, f®O=yvQHetwi), (3.19) 


where u,v € L’ and w(t) € L’ for all t € [0, 1], and y(t) is a function taking real 
values, continuous in the interval [0, 1], and moreover, y (0) = @ and y(1) = B. 

If x €L* and y EL’, then a > 0 and £ <0, and by properties of continuous 
functions known from calculus, y(t) = 0 for some 0 < t < 1. But then the vector 
J (t) = w(t) is contained within the hyperplane L’, and it follows that vectors x and 
y cannot be deformed into each other in the set L \ L’. Therefore, the metric space 


3.3. Linear Transformations of Vector Spaces 101 


Fig. 3.5 Bases assigning one , é, 
and the same flag e, 


e, 


L\L’ is not path-connected. But if x, y ¢L* or x, y €L-, then in the representa- 
tions (3.19) for these vectors, the numbers a and 6 have the same sign. Then, as is 
easily seen, the mapping f(t) = (1 —t)x + ty, t € [0, 1], determines a continuous 
deformation of x to y in the set Lt or L~, respectively. 

From these considerations, it is easy to obtain a proof of the previous assertion 
without using any formulas. 

If we distinguish one of the two half-spaces Lt and L~ (we shall denote the 
half-space thus distinguished by LT), then the pair (L,L’) is said to be directed. 
For example, in the case of a line (Fig. 3.4(a)), this corresponds to a choice of the 
direction of the line L. 

Using these concepts, we can obtain a more visual idea of the notion of basis (in 
the case of a real vector space). 


Definition 3.46 A flag in a finite-dimensional vector space L is a sequence of sub- 
spaces 


OclyCclyc:::Chy=L (3.20) 
such that 


(a) dimL; =i foralli=1l,...,n. 
(b) Each pair (L;, L;-1) is directed. 


It is clear that in view of condition (a), the subspace L;_; is a hyperplane in L,, 
and therefore the above definition of directedness is applicable. 

Every basis €),..., @, of a space L determines a particular flag. Namely, we set 
Lj = (e1,...,e;), and to apply directedness to the pair (L;, L;_1), we select in the 
collection of half-spaces Lr the one determined by the vector e; (clearly, e; ¢ L;_1). 

However, we must observe that different bases of the space L can determine one 
and the same flag. For example, in Fig. 3.5, the bases (e;, e2) and (e1, e) determine 
the same flag in the plane. But later, in Sect. 7.2, we shall meet a situation in which 
there is defined a bijection between the bases of a vector space and its flags (this is 
accomplished through the selection of some special bases). 


3.3 Linear Transformations of Vector Spaces 


Here we shall present a very broad generalization of the notion of linear function, 
with which our course began. The generalization occurs in two aspects. First, in 
Sect. 1.1, a linear function was defined as a function of rows of length n. Here, we 


102 3 Vector Spaces 


shall replace the rows of given length with vectors of an arbitrary vector space L. 
Second, the value of the linear function in Sect. 1.1 was considered a number, that 
is, in other words, an element of the space R! or C! or K! for an arbitrary field K. 
We shall now replace the numbers with vectors in an arbitrary vector space M. Thus 
our definition will include two vector spaces L and M. The reader may consider both 
spaces real, complex, or defined over an arbitrary field K, but it must be the same 
field for both L and M. In this case, we shall speak about the elements of the field 
using the same conventions that we established in Sect. 3.1 for scalars (see p. 82). 
Let us recall that a linear function is defined by properties (1.8) and (1.9), pre- 
sented in Theorem 1.3 on page 3. The following definition is analogous to this. 


Definition 3.47 A linear transformation of a vector space L to another vector space 
M is a mapping A:L— M that assigns to each vector x € L some vector A(x) € M 
and exhibits the following properties: 


A(X + y) = A(x) + ALY), 
A(ax) =aA(x) 


(3.21) 


for every scalar @ and all vectors x and y in the space L. 


A linear transformation is also called an operator or (only in the case that M = L) 
an endomorphism. 

Let us note one obvious but useful property that follows directly from the defini- 
tions. 


Proposition 3.48 Under any linear transformation, the image of the null vector is 
the null vector. More precisely, since we may be dealing with two different vector 
spaces, we might reformulate the statement in the following form: if A:L— Mis a 
linear transformation, and 0 € Land V € Mare the null vectors in the vector spaces 
L and M, then A(0) = 0’. 


Proof By the definition of a vector space, for an arbitrary vector x € L, there exists 
an additive inverse —x € L, that is, a vector such that x + (—x) = 0, and moreover 
(see p. 82), the vector —x is obtained by multiplying x by the number —1. Applying 
the linear transformation A to both sides of the equality 0 = x + (—x), then in view 
of properties (3.21), we obtain A(0) = A(x) — A(x) = 0’, since for the vector A(x) 
of the space M, the vector —.A(x) is its additive inverse, and their sum is 0’. 


Example 3.49 For an arbitrary vector space L, the identity mapping defines a linear 
transformation €(x) = x, for every x € L, from the space L to itself. 


Example 3.50 A rotation of the plane R? through some angle about the origin is 
a linear transformation (here L = M = R?’). The conditions of (3.21) are clearly 
satisfied here. 


3.3. Linear Transformations of Vector Spaces 103 


Example 3.51 If L is the space of continuously differentiable functions on an in- 
terval [a, b], and M is the space of continuous functions on the same interval, and 
if for x = f(t), we define A(x) = f’(t), then the mapping A:L— M is a linear 
transformation. 


Example 3.52 If L is the space of twice continuously differentiable functions on 
an interval [a,b], M is the same space as in the previous example, q(t) is some 
continuous function on the interval [a, b], and for x = f(t) we set A(x) = f(t) + 
q(t) f(t), then the mapping A:L— Misa linear transformation. In analysis, it is 
known as the Sturm—Liouville operator. 


Example 3.53 Let L be the space of all polynomials, and for x = f(t), as in Exam- 
ple 3.51, we set A(x) = f’(t). Clearly, A:L— L is a linear transformation (that is, 
here we have M=L). But if L is the space of polynomials of degree at most n, and 
M is the space of polynomials of degree at most n — 1, then the same formula gives 
a linear transformation A:L— M. 


Example 3.54 Suppose we are given the representation of a space L as a direct 
sum of two subspaces: L = L’ @ L”. This means that every vector x € L can be 
uniquely represented in the form x = x’ +x”, where x’ € L’ and x” €L”. Assigning 
to each vector x € L the term x’ € L’ in this representation gives a mapping P : L—> 
L’, P(x) =x’. A simple verification shows that f is a linear transformation. It is 
called the projection onto the subspace L’ parallel to L”. In this case, for the vector 
x EL, its image P(x) € L’ is called the projection vector of x onto L’ parallel to L”. 
Analogously, for any subset X CL, its image P(X) C L’ is called the projection of 
X onto L’ parallel to L”. 


Example 3.55 Let L=M and dimL = dimM = 1. Then L= M = (e), where e is 
some nonnull vector and A(e) = we, where a is a scalar. From the definition of 
a linear transformation, it follows directly that A(x) = ax for every vector x € L. 
Consequently, such is the general form of all linear transformations A:L— L in 
the case dimL = 1. 


In the sequel, we shall consider the case that the dimensions of the spaces L and 
M are finite. This means that in L, there exists some basis e;,..., €;,, and in M, there 
isa basis f,,..., f,,,- Then every vector x € L can be written in the form 


X= Oe) + A72€2 +--+ + Apen. 


Using the relationship (3.21) several times, we shall obtain that for any linear trans- 
formation A:L— M, the image of the vector x is equal to 


A(x) =a, A(e) + arA(e2) +--+ anA(en). (3.22) 


104 3 Vector Spaces 


The vectors A(e,),..., A(e,) belong to the space M, and by the definition of a 
basis, they are linear combinations of the vectors f),..., f,,, that is, 


A(ey)=an fy tanfot-->+anifm: 
A(e2) =af, +anfot+ BAS + am2f m; 


A(e€n) =ainf | + don fo ee + dinnf m- 


(3.23) 


On the other hand, the image A(x) of the vector x belonging to the space M has in 
the basis f,,..., f,, certain coordinates 61, ..., Bm, that is, it can be written in the 
form 


A(x) = Bi fy + Bofot--++BmSfm: (3.24) 


and moreover, such a representation is unique. 
Substituting in (3.22) the expression (3.23) for A(e;) and grouping terms as nec- 
essary, we obtain a representation of A(x) in the form 


A(x) = ara fy + ari fyt+-+-+amifm) +: 
+ On (din f + 42n fo +--+ +4mnfm) 
= (a1 a1, +2412 +++» +Ondin) fy +>: 
+ (014m1 + 024m2 + +++ + Ondmn) Sm: 


Because of the uniqueness of the decomposition (3.24), we thus obtain expres- 
sions for the coordinates 61, ..., Bm of the vector A(x) in terms of the coordinates 
Q1,...,Q@y Of the vector x: 


By = ayy + 41202 +++» + a{nQn, 


Bz = a21 01 + a2202 +--+ + d2nQn, (3.25) 


Bin = Am 1 + Gn202 + +++ + Amn. 


Formula (3.25) gives us an explicit expression for the action of the linear transfor- 
mation A for the chosen coordinates (that is, bases) of the spaces L and M. This 
expression represents by itself the linear substitution of variables with the matrix 


41 412 <*** Gin 
421 422 *** an 

A= . . : . ; (3.26) 
Gm1 4m2 *** Amn 


consisting of the coefficients that enter into the formula (3.25). The matrix A is of 
type (m,n) and is the transpose of the matrix consisting of the coefficients of the 
linear combinations in formula (3.23). 


3.3. Linear Transformations of Vector Spaces 105 


Definition 3.56 The matrix A in (3.26) is called the matrix of the linear transfor- 
mation A:L— M given by formula (3.23) in the bases e1,...,e, and fy,..., fin- 


In other words, the matrix A of the linear transformation A is a matrix whose 
ith column consists of the coordinates of the vector A(e;) in the basis f,,..., fim- 
We would like to emphasize that the coordinates are written in the columns, and not 
in the rows (which, of course, also would have been possible), which has a number 
of advantages. It is clear that the matrix of the linear transformation depends on 
both bases e1,...,e, and f),..., f,,,. The situation here is the same as with the 
coordinates of a vector. A linear transformation has no matrix “in and of itself”: in 
order to associate a matrix with the transformation, it is necessary to choose bases 
in the spaces L and M. 

Using matrix multiplication, as defined in Sect. 2.9, one can write formula (3.25) 
in a more compact form. To do so, we introduce the following notation: Let a be a 


row vector (a matrix of type (1, 7)), with coordinates a1,...,@,, and let B be a row 
vector with coordinates 61,..., 8,. Similarly, let [w] be a column vector (a matrix 
of type (n,1)), consisting of the same coordinates a1,...,@,, only now written 
vertically, and let [B] be a column vector consisting of 61, ..., Bn, that is, 
ay Bi 
IaJ=] : |, [B]=] : 
An Bn 


It is clear that w and [a] are interchanged under the transpose operation, that is, 
o* = [a], and similarly, B* = [8]. Recalling the definition of matrix multiplication, 


we see that formula (3.25) has the form 
[B]=Ala] or B=aA%*. (3.27) 


The formulas that we have obtained show that with the chosen bases, a linear 
transformation is uniquely determined by its matrix. Conversely, having chosen 
bases for the vector spaces L and M in some way, then if we define the mapping 
“A :L— M with the help of relationships (3.22) and (3.23) with arbitrary matrix 
A = (qj;), itis easy to verify that “ will be a linear transformation. Therefore, there 
exists a bijection between the set £(L, M) of linear transformations L into M and the 
set of matrices of type (n,m). It is the choice of bases in the spaces L and M that 
determines this correspondence. In the following section, we shall explain precisely 
how the matrix of a linear transformation depends on the choice of bases. 

We shall denote the space of all linear transformations of the space L into M by 
L£(L, M). This set can itself be viewed as a vector space if for the mappings A and 
8B in L(L, M) we define the vector sum and the product of a vector and a scalar a by 
the following formulas: 


(A+ B)(x) = A(x) + B(x), 
(aA)(x) =aA(x). 


(3.28) 


106 3 Vector Spaces 


It is easily checked that A+ 8 and a A are again linear transformations of L into M, 
that is, each of them satisfies conditions (3.21), while the operations that we have 
defined satisfy conditions (a)-(h) of a vector space. The null vector of the space 
LL, M) is the linear transformation O : L > M, defined by the formula O(x) = 0 
for all x € L (in the last equality, 0 denotes, of course, the null vector of the space 
M). It is called the null transformation. 

For some bases, suppose the matrix A of type (3.26) corresponds to the transfor- 
mation “A:L— M, while the matrix B of the same type corresponds to the transfor- 
mation B : L — M. We now explain how these matrices correspond to the transfor- 
mations A+ 8 and aA defined by the conditions (3.28). By (3.23), we have 


(A+ Bei =a fy, +a fot+---+amifim + bout + b2i fo +--+ bmi fm 
= (aii + bi) fy + (ari +21) fo +++ + Gni + Omid) fm. 


and consequently the matrix A + B corresponds to the transformation A+ 8. It can 
be checked even more simply that the transformation aA corresponds to the matrix 
aA. We thus see again that the set of linear transformations 2(L, M), or the set of 
matrices of type (m,n), is converted into a vector space. 

In conclusion, let us consider the composition of mappings that are linear trans- 
formations. 

Let L, M, N be vector spaces, and let A:L— M and 8: M— N be linear trans- 
formations. We observe that this is a special case of mappings between arbitrary 
sets, and by the general definition (see p. xiv), the composition of mappings 8 and 
A is the mapping BA:L— N given by the formula 


(BA)(x) = B(A(x)) (3.29) 


for all x € L. A simple verification shows that B.A is a linear transformation: it is 
necessary only to verify by substitution into (3.29) that all the relationships (3.21) 
are satisfied by BA if they are satisfied for A and 8. In particular, in the case 
L=M=N we obtain that the composition of linear transformations from L to L is 
again a linear transformation from L to L. 

Let us assume now that in the vector spaces L, M, and N we have chosen bases 
€1,.--,€n, fy,--->fm> and g),...,g 7. We shall denote the matrix of the linear 
transformation A in the bases e1,...,e, and f;,..., f,,, by A, and the matrix of the 
linear transformation 8 in the bases f,..., f,, and g,,..., g, by B, and we seek 
the matrix of the linear transformation 8A in the bases e;,...,@, and g),..., g7. 
To this end, we must substitute the formulas of (3.23) for the transformation A into 
analogous formulas for the transformation B: 


Bf) =b118; +5218. +--- +b), 
B( fy) = b129, + bg. +--+ bg), 


Bf m) = bm + bom 89 apts + bimg}- 


(3.30) 


3.4 Change of Coordinates 107 


Formulas (3.23) and (3.30) represent two linear replacements in which the vec- 
tors play the role of the variables, whereas in other respects, they are no different 
from linear replacements of variables as examined by us earlier (see p. 62). Conse- 
quently, the result of sequentially applying these replacements will be the same as 
in Sect. 2.9, namely linear replacement with the matrix BA; that is, we obtain the 
relationship 


1 
(BAe) =) cjg; i=1,....n, 
j=l 


where the matrix C = (c;;) of the transformation B.A is BA. We have thus estab- 
lished that the composition of linear transformations corresponds to the multiplica- 
tion of their matrices, taken in the same order. 

We observe that we have thus obtained a shorter and more natural proof of the 
associativity of matrix multiplication (formula (2.52)) in Sect. 2.9. Indeed, the asso- 
ciativity of the composition of arbitrary mappings of sets is well known (p. xiv), and 
in view of the established connection between linear transformations and their ma- 
trices (in whatever selected bases), we obtain the associativity of the matrix product. 

The operations of addition and composition of linear transformations are con- 
nected by the relationships 


A(B+C)=AB+t AC, (A+ B)C=AC+H+ AC, 


called the distributive property. To prove this, one may either use the definitions of 
addition and composition defined above together with the well-known property of 
the distributivity of the real and complex numbers (or the elements of any set K, 
since it derives from the properties of a field) or derive the distributivity of linear 
transformations from what was proved in Sect. 2.9 regarding distributivity of ad- 
dition and multiplication of matrices (formula (2.53)), again using the connection 
established above between a linear transformation and its matrix. 


3.4 Change of Coordinates 


We have seen that the coordinates of a vector relative to a basis depend on which 
basis in the vector space we have chosen. We have seen as well that the matrix of a 
linear transformation of vector spaces depends on the choice of bases in both vector 
spaces. We shall now establish an explicit form of this dependence both for vectors 
and for transformations. 

Let e1,..., @, be acertain basis of the vector space L. By Corollary 3.34, a basis 
of the given vector space consists of a fixed number of vectors, equal to dimL. 
Let e|,...,e), be another basis of L. By definition, every vector x € L is a linear 
combination of the vectors e1,..., @n, that is, it can be expressed in the form 


X=ayey +a2e2 +--+: + Ann (3.31) 


108 3 Vector Spaces 


with coefficients a;, which are the coordinates of x in the basis e;, ..., e,. Similarly, 
we have the representation 


x =aje) tase, +---+ ae), (3.32) 
with coordinates a’ of the vector x in the basis e/,,..., e',. 
Furthermore, each of these vectors e; ...,@, is itself a linear combination of the 
vectors @1,..., €y, that is, 


e} = cer +c21€2 +-+- + Cnien, 
€5 =c12e1 +¢22€2 +--+ Cn2€n, 


(3.33) 
e, = Cin@1 + C2n@2 ++++ + Canen 
with some scalars c;;. And similarly, each of the vectors e),..., @y is a linear com- 
bination of e|,..., e/,, that is, 
€1 = C41, Heqeg ttc en, 
02 = Cyn € Hy ey te + Choe), (3.34) 


for some scalars c; es 

Clearly, the collections of coefficients c;; and Cj in formulas (3.33) and (3.34) 
provide the exact same information about the “mutual relationship” between the 
bases e},...,@, and e| ,--+,@,, in the space L, and therefore it suffices for us to know 
only one (either one will do) of these collections. More detailed information about 
the relationship between the coefficients c;; and c;, will be given below, but first, 
we shall deduce a formula that describes the relationship between the coordinates of 
the vector x in the bases e1,..., ée, and e}. wod5 e,.. To this end, we shall substitute 
the expressions (3.33) for the vectors e’ into (3.32). Grouping the requisite terms, 
we obtain an expression for the vector x as a linear combination of e1,..., en: 


/ / 
X= ary (cy1@1 +21€2 +++ + Cn en) +++ +a) (Cin€1 + Con€2 +++ + Cann) 
_ (,/ 1 ! ! ! ! 
= (wice11 + MgC12 +++ +0),Cin)e1 Spe (a Cn1 + gC? Fo + Gi, Gan) Ons 


Since @1,..., @, is a basis of the vector space L and the coordinates of the vector x 
in this space are a; (formula (3.31)), we obtain 


ay = C110, +01205 +--+ Cindy, 
/ / FE 
Oy = C21) + C22Ay +++ + C2nQy, (3.35) 


! ! ! 
An = Cn, ar Cn2QA5 + +++ + CynQ,. 


3.4 Change of Coordinates 109 


Relationships (3.35) are called change-of-coordinate formulas for a vector. Such 
a formula represents a linear change of variables, with the help of the matrix C 
consisting of the coefficients c;;, but in an order different from that in (3.33). In 
particular, C is the transpose of the matrix of coefficients (3.33). The matrix C is 


called the transition matrix from the basis e’,,..., e/, to the basis e;,..., @n, since 
with its help, the coordinates of a vector in the basis e),...,@, are expressed in 
terms of its coordinates in the basis e},..., é,. 


Using the product rule for matrices, the formula for the change of coordinates 
can be written in a more compact form. To this end, we shall use notation from the 
preceding section: « is a row vector consisting of the coordinates a1,...,@,, and 
[«] is a column vector consisting of the very same coordinates. Keeping in mind the 
definition of matrix multiplication (Sect. 2.9), we see that formula (3.35) takes the 
form 


[#]=C[o’] or w=a'C%. (3.36) 


Remark 3.57 It is not difficult to see that the formulas for changing coordinates are 
quite similar to the formulas for a linear transformation. More precisely, relation- 
ships (3.35) and (3.36) are special cases of (3.25) and (3.27) for m =n, for exam- 
ple, if the vector space M coincides with L. This allows an interpretation of changing 
coordinates (that is, changing bases) of a vector space L as a linear transformation 
A:L—>L. 

Similarly, if we substitute expressions (3.34) for vectors e; into (3.31), we obtain 
the relationship 


f f / Fs 
Oy = Cy 1 + CypQ2 +++ + Cy, An, 


ey: / ! 
Oy = Cy) A] + Cy9H2 + +++ + C5, An, (3.37) 


/ / / / 
An = CQL 1 C92 Ft + Cyn An, 


similar to (3.35). Formula (3.37) is also called the substitution formula for coordi- 
nates of a vector. It represents the linear substitution of variables with the matrix C’, 
which is the transpose of the matrix consisting of the coefficients c; ; from (3.34). 


The matrix C’ is called the transition matrix from the basis e1,..., @, to the basis 
e}.-..,@),. In matrix form, formula (3.37) takes the form 
[o’]=C'[a] or a’ =ac”™. (3.38) 


Using formulas (3.36) and (3.38), one easily establishes the connection between C 
and C’. 


Lemma 3.58 The transition matrices C and C' between any two bases of a vector 
space are nonsingular and are the inverses of each other. That is, C' = C7. 


Proof Substituting the expression [o’] = C’[a] into [#] = C[e’], taking into ac- 
count the associativity of matrix multiplication, we obtain the equality [aw] = 


110 3 Vector Spaces 


(CC’)[a]. This equality holds for all column vectors [a] of a given length n, and 
therefore, the matrix CC’ on the right-hand side is the identity matrix. Indeed, 
rewriting this equality in the equivalent form (CC’ — E)[o] = 0, it becomes clear 
that if the matrix CC’ — E contains at least one nonzero element, then there ex- 
ists a column vector [@] for which (CC’ — E)[a] 4 0. Thus we conclude that 
CC' = E, from which by definition of the inverse matrix (see Sect. 2.10), it fol- 
lows that C’=C7!. 


We shall now explain how the matrix of a linear transformation depends on the 
choice of bases. Suppose that in the bases e;,...,e, and f),..., f,, of the vector 
spaces L and M the transformation A :L— M has matrix A, the coordinates of the 
vector x are denoted by a;, and the coordinates of the vector A(x) are denoted by 
£;. Similarly, in the bases e\. ...,@, and fis ae fin of these vector spaces, the 
same transformation A :L— M has matrix A’, the coordinates of the vector x are 
denoted by a’, and the coordinates of the vector A(x) are denoted by Bi. 

Let C be the transition matrix from the basis e1: ...,@, to the basis e1,..., en, 
which is a nonsingular matrix of order n, while D is the transition matrix from the 
basis f,..., f/, to the basis f;,..., f, which is a nonsingular matrix of order 
m (here n and m are the dimensions of the vector spaces L and M). Then by the 
change-of-coordinates formula (3.38), we obtain 


[«J=c lel, — [6'}= "11, 
and formula (3.27) of the linear transformation gives us the equalities 
[8]= Alo],  [B'] = A’[a’]. 


Let us substitute on the right-hand side of the equality [8’] = D~'[B], the ex- 
pression [8B] = A[a], and on the left-hand side, the expression [B’] = A’[aw’] = 
A’C~![e], as a result of which we obtain the relationship 


A’C7![a] = D7! Afee]. (3.39) 


This line of argument holds for any vector x € L, and hence equality (3.39) holds 
for any column vector [@] of length n. Clearly, this is possible if and only if we have 
the equality 


Ac! =D"'A. (3.40) 


Indeed, both matrices A’C~! and D~!A are of type (m,n), and if they were not 
equal, then there would be at least one row (with index i between | and n) and 
one column (with index j between | and m) such that the ijth elements of the 
matrices A’C~! and D~'A did not coincide. But then one could easily identify a 
column vector [«] for which the equality (3.39) was not satisfied. For example, set 
its element a; equal to 1, and all the rest to zero. 

Let us note that we could have obtained formula (3.40) by considering the tran- 
sition from one basis to another as a linear transformation of vector spaces given 


3.4 Change of Coordinates 111 


by multiplication by the transition matrix (see Remark 3.57 above). Indeed, in this 
case, we obtain the following diagram, in which each arrow indicates multiplication 
of a column vector by the matrix next to it: 


By the definition of matrix multiplication, from the vector [w], we can obtain the 
vector [f’] located in the opposite corner of the diagram in two ways: multiplication 
by the matrix A’C~! and multiplication by the matrix D~!A. Both methods should 
give the same result (in such case, we say that the diagram is commutative, and this 
is equivalent to equality (3.40)). 

We can multiply both sides of (3.40) on the right by the matrix C, obtaining as a 
result 


A'= D7'!AC, (3.41) 


which is called the formula for a change of matrix of a linear transformation. 

In the case that the dimensions n and m of the vector spaces L and M coincide, 
both matrices A and A’ are square (of order n = m), and for such matrices, one has 
the notion of the determinant. Then by Theorem 2.54, from formula (3.41), there 
follows the relationship 


|A’| =|D~"|-|A]-|C| =|[DI~! - |A]- ICI. (3.42) 


Since C and D are transition matrices, they are nonsingular, and therefore the de- 
terminants |A’| and |A| differ from each other through multiplication by the number 
|D|~!|C| 4 0. This indicates that if the matrix of a linear transformation of spaces 
of the same dimension is nonsingular for some choice of bases, then it will be non- 
singular for any other choice of bases for these spaces. Therefore, we may make the 
following definition. 


Definition 3.59 A linear transformation of spaces of the same dimension is said to 
be nonsingular if its matrix (expressed in terms of some choice of bases of the two 
spaces) is nonsingular. 


There is a special case, which is of greatest importance for a variety of applica- 
tions to which Chaps. 4 and 5 will be devoted, in which the spaces L and M coincide 
(that is, A is a linear transformation of a vector space into itself and so n = m), 
the basis e1,...,@, coincides with the basis f;,..., f,, and the basis e},...,e), 
coincides with f),..., f/,,- Consequently, in this case, D = C, and the change-of- 
matrix formula (3.41) is converted to 


A’=C7!AC, (3.43) 


112 3 Vector Spaces 


and equation (3.42) assumes the very simple form |A’| = |A|. This means that al- 
though the matrix of a linear transformation of a vector space L into itself depends 
on the choice of basis, its determinant does not depend on the choice of basis. This 
circumstance is frequently expressed by saying that the determinant is invariant un- 
der a linear transformation of a vector space into itself. In this case, we may give the 
following definition. 


Definition 3.60 The determinant of a linear transformation A:L— L of a vector 
space to itself (denoted by |.4]) is the determinant of its matrix A, expressed in terms 
of any basis of the space L, that is, | A| = |A]. 


3.5 Isomorphisms of Vector Spaces 


In this section we shall investigate the case in which a linear transformation A :L—> 
M is a bijection. We observe first of all that if A is a bijective linear transformation 
from L to M, then like any bijective mapping (not necessarily linear), it has an inverse 
mapping A~!:M-> L. It is clear that A~! will also be a linear transformation 
from M to L. Indeed, if for the vector y, € M there is a unique vector x; € L such 
that A(x1) = y,, and for y, € M there is an analogous vector x2 € L such that 
A(X, +X2) = y; + yo, then by the definition of inverse mapping, we obtain the 
first of conditions (3.21) in the definition of a linear transformation: 


A "(yy + yo) =x1 tx. = Al (y1) + A109). 


Similarly, but even more simply, we can verify the second condition of (3.21), that 
is, that A~!(wy) = a@A7!(y) for an arbitrary vector y € M and scalar a. 


Definition 3.61 Vector spaces L and M between which there exists a bijective linear 
transformation are said to be isomorphic, and the transformation A itself is called 
an isomorphism. The fact that vector spaces L and M are isomorphic is denoted by 
L~M. If we wish to specify a concrete transformation A :L— M that produces the 
isomorphism, then we write A:L>M. 


The property of being isomorphic defines an equivalence relation on the set of 
all vector spaces (see the definition on p. xii). To prove this, we need to verify three 
properties: reflexivity, symmetry, and transitivity. Reflexivity is obvious: we have 
simply to consider the identity mapping € : LL. Symmetry is also obvious: if 
A :LM, then the inverse transformation A~! is also an isomorphism, that is, 
A-!:MSL. Finally, if A: LSM and B:MSN, then, as is easily verified, the 
transformation C = BA is also an isomorphism, that is, C : LN, which estab- 
lishes transitivity. Therefore, the set of all vector spaces can be represented as a 
collection of equivalence classes of vector spaces whose elements are mutually iso- 
morphic. 


3.5 Isomorphisms of Vector Spaces 113 


Example 3.62 With the choice of basis e;,...,é@, in a vector space L over a field 
K, assigning to a vector x € L the row consisting of its coordinates in this basis 
establishes an isomorphism between L and the row space K”. Similarly, the elements 
of a row in the form of a column produces an isomorphism between the row space 
and the column space (with rows and columns containing the same numbers of 
elements). This explains why we use a single symbol for denoting these spaces. 


Example 3.63 Through the selection of bases e1,...,@, and fy,..., f,, in the 
spaces L and M of dimensions n and m, we assign to each linear transformation 
“A :L— M its matrix A. We thereby establish an isomorphism between the space 
£(L, M) and the space of rectangular matrices of type (m, 7). 


Theorem 3.64 Two finite-dimensional vector spaces L and M are isomorphic if and 
only if dimL = dimM. 


Proof The fact that all vector spaces of a given finite dimension are isomorphic 
follows easily from the fact that every vector space L of finite dimension 7 is iso- 
morphic to the space K” of rows or columns of length n (Example 3.62). Indeed, 
let L and M be two vector spaces of dimension n. Then L~ K” and M ~ K”, from 
which as a result of transitivity and symmetry, we obtain L~ M. 

We now prove that isomorphic vector spaces L and M have the same dimension. 
Let us assume that 4 : L> M is an isomorphism. Let us denote by 0 € L and 0’ eM 
the null vectors in the spaces L and M. Recall, by the property of linear transforma- 
tions that we proved on p. 102, that A(0) = 0’. Let dimM = m, and let us choose 


in M some basis f,..., f,,- By the definition of isomorphism of a vector space L, 
there exist vectors €;,..., @», such that f; = A(e;) fori = 1,...,m. We shall prove 
that the vectors e,...,@m form a basis of the space L, whence it will follow that 


dim L =m, completing the proof of the theorem. 

First of all, let us show that these vectors are linearly independent. Indeed, if 
€1,---,@m Were linearly dependent, then there would exist scalars a1,...,@, not 
all equal to zero, such that 


aye; taze2 +---+Amem = 9. 


But after applying the linear transformation A to both parts of this relationship, in 
view of the equality 4(0) = 0’, we would obtain 


ai fytorfat---+omfm =, 


from which follows a; = 0, ..., @m = 0, since by assumption, the vectors 
Sf i,---> fm are linearly independent. 

Let us now prove that every vector x € L is a linear combination of the vectors 
€1,..-,@m. Let us set A(x) = y and express y in the form 


yH=afyt+orfat---+amfm- 


114 3 Vector Spaces 
Applying to both sides of this equality the linear transformation A~!, we obtain 
xX Saye] + 02€2 + +++ + Amem, 


as required. We have thus shown that the vectors e1,...,@m form a basis of the 
vector space L. 


Example 3.65 Suppose we are given a system of m homogeneous linear equations 
inn unknowns x1,..., X, and with coefficients in the field K. As we saw in Exam- 
ple 3.8 (p. 84), its solution forms a subspace L’ of the space K” of rows of length n. 
Since we know that the dimension of the space K” is n, it follows that dimL’ <n. 
Let us determine this dimension. To this end, using Theorem 1.15, let us bring our 
system into echelon form (1.18). Since the equations of the original system are ho- 
mogeneous, it follows that in (1.18), all the equations will also be homogeneous, 
that is, all the constant terms b; are equal to 0. Let r be the number of principal un- 
knowns, and hence (n —r) is the number of free unknowns. As shown following the 
proof of Theorem 1.15, we shall obtain all the solutions of our system by assigning 
arbitrary values to the free unknowns and then determining the principal unknowns 
from the first ry equations. That is, if (x1, ...,X,) is some solution, then comparing 
to it the row of values of the free unknowns (x;,,...,x;,_,), we obtain a bijection 
between the set of solutions of the system and rows of length n — r. An obvious 
verification shows that this relationship is an isomorphism of the spaces K”~" and 
L’. Since dimK"~" =n — r, then by Theorem 3.64, the dimension of the space L’ 
is also equal to n — r. Finally, we observe that the number r is equal to the rank of 
the matrix of the system (see Sect. 2.8). Therefore, we have obtained the following 
result: the space of solutions of a homogeneous linear system of equations has di- 
mension n — r, where n is the number of unknowns, and r is the rank of the matrix 
of the system. 


Let A:LM be an isomorphism of vector spaces L and M of dimension n, 
and let e;,...,@, be a basis of L. Then the vectors A(e1),..., A(e,) are linearly 
independent. Indeed, if not, we would have the equality 


aj A(e}) +--+ anA(en) = A(aje; +-+- + Onen) = 0, 


from which by the property 4(0) = 0’ and that fact that A is a bijection, we obtain 
the relationship a;e; +--- + a,e, = 0, contradicting the definition of basis. Hence 
the vectors A(e1),..., (e,) form a basis of the vector space M. It is easy to see that 
in these bases, the matrix of the transformation A is the identity matrix of order n, 
and the coordinates of an arbitrary vector x € L in the basis e),..., @€, coincide with 
the coordinates of the vector A(x) in the basis A(eé1),..., (en). Consequently, the 
transformation A in nonsingular. 

A similar argument easily establishes the converse fact that an arbitrary nonsin- 
gular linear transformation A: L — M of vector spaces of the same dimension is an 
isomorphism. 


3.5 Isomorphisms of Vector Spaces 115 


Remark 3.66 Theorem 3.64 shows that all assertions formulated in terms of con- 
cepts entering the definition of a vector space are equivalent for all spaces of a given 
dimension. In other words, there exists a single, unique theory of n-dimensional 
vector spaces for a given n. An example of the opposite situation can be found 
in Euclidean geometry and the non-Euclidean geometry of Lobachevsky. It is well 
known that if we accept all the axioms of Euclid except for the “parallel postulate” 
(so-called absolute geometry), then there are two completely different geometries 
that satisfy these axioms: Euclid’s and Lobachevsky’s. With vector spaces, such a 
situation does not arise. 


The definition of an isomorphism under the linear transformation A:L— M 
consists of two parts. The first asserts that for an arbitrary vector y € M, there ex- 
ists a vector x € L such that A(x) = y, that is, the image A(L) coincides with the 
entire space M. The second condition is that the equality A(x1) = A(x2) holds only 
for x; = x2. Since A is a linear transformation, then for the latter condition to be 
satisfied, it is necessary that the equality (x) = 0’ imply x = 0. This motivates the 
following definition. 


Definition 3.67 The set of vectors in the space L such that A(x) = 0’ is called the 
kernel of the linear transformation A.° In other words, the kernel is the preimage of 
the null vector under the mapping A. 


It is obvious that the kernel of a linear transformation A :L— M is a subspace 
of L, and that its image A(L) is a subspace of M. 

Thus to satisfy the second condition in the definition of a bijection, it is necessary 
that the kernel -A consist of the null vector alone. But this condition is sufficient as 
well. Indeed, if for vectors x; 4 x2 the condition A(x1) = A(x2) is satisfied, then 
subtracting one side of the equality from the other and applying the linearity of the 
transformation A, we obtain A(x; — x2) = 0’, that is, the vector x; — x2 is in the 
kernel of A. Therefore, the linear transformation A :L— M is an isomorphism if 
and only if its image coincides with all of M and its kernel is equal to (0). We shall 
now show that if A is a linear transformation of spaces of the same finite dimen- 
sion, then an isomorphism results if either one or the other of the two conditions is 
satisfied. 


Theorem 3.68 Jf 4 :L— Misa linear transformation of vector spaces of the same 
finite dimension and the kernel of A is equal to (0), then A is an isomorphism. 


Proof Let dimL = dimM = n. Let us consider a particular basis e1,...,e, of the 
vector space L. The transformation 4 maps each vector e; to some vector f; = 
“(e;) of the space M. Then the vectors f,,..., f,, are linearly independent, that is, 


5Translator’s note: Another name for kernel that the reader may encounter is null space (since the 
kernel is the space of all vectors that map to the null vector). 


116 3 Vector Spaces 


they form a basis of the space M. Indeed, from the linearity of the transformation A, 


for arbitrary scalars a1, ...,@,, we have the equality 
A(aer t+) +Onen) = fy te +Onfy- (3.44) 
Ifoif,t+---+anf,, =0' for some collection of scalars a1, ..., @n, then from the 


condition that the kernel of A is equal to (0), we will have aje; +---+a,e, = 90, 
from which it follows, by the definition of a basis, that all the scalars a; are equal 
to zero. The relationship (3.44) also shows that the transformation A maps each 
vector x € L with coordinates (a@1,...,@,) in the basis e;,..., e, into the vector M 
with the same coordinates in the corresponding basis f,,..., f,, (the matrix of the 
transformation A in such bases is the identity matrix of order 77). 

By the definition of an isomorphism, it suffices to prove that for an arbitrary 
vector y € M, there exists a vector x € L such that A(x) = y. Since the vectors 
Jf \,---,f, form a basis of the space M, it follows that y can be expressed as a linear 
combination of these vectors with certain coefficients (a1, ...,@,), from which by 
the linearity of A it follows that 


y=a,fyt---+anf, = A(ajer +--+ anen) = A(x) 


with vectors x = aje, +---+@Qne,, which completes the proof of the theorem. 


Theorem 3.69 [f A:L— Misa linear transformation of vector spaces of the same 
finite dimension and the image of A(L) is equal to M, then A is an isomorphism. 


Proof Let f,,..., f, be a basis of the vector space M. By the condition of the 
theorem, for each f;, there exists a vector e; € L such that f; = A(e;). We shall 
show that the vectors e;,..., @, are linearly independent and therefore form a basis 
of L. Indeed, if there existed a collection of scalars a1, ...,@, such that aje; +---+ 
Qn €n = 0, then by A(0) = 0' and the linearity of 4, we would have the equality 


A(ayey +++ + Onn) =a A(e1) +---+anA(en) =a fy +---+0nf =O, 


from which by the definition of basis it would follow that a; = 0. That is, the vectors 
€1,...,@n, indeed form a basis of the space L. 

It follows from the definition of a basis that an arbitrary vector x € L can be 
written as x =a ,e; +---+a,e,. From this, we obtain 


A(x) = A(ayey +--+ nen) =a, A(e1) +--- +anAlen) 
=a fyt---+onfy. 


If A(x) = 0’, then we have a; f; +--+ anf, =’, which is possible only if all 
the a; are equal to 0, since the vectors f,,..., f,, form a basis of the space M. But 
then, clearly, the vector x = ae; +---+a,e, equals 0. Therefore, the kernel of the 
transformation A consists solely of the null vector, and by Theorem 3.68, -A is an 
isomorphism. 


3.5 Isomorphisms of Vector Spaces 117 


It is not difficult to see that the theorems proved just above give us the following 
result. 


Theorem 3.70 A linear transformation A:L— M between vector spaces of the 
same finite dimension is an isomorphism if and only if it is nonsingular. 


In other words, Theorem 3.70 asserts that for spaces of the same finite dimension, 
the notion of a nonsingular transformation coincides with that of isomorphism. 

With the proof of Theorem 3.68 we have also established one important fact: 
a nonsingular linear transformation A:L— M of vector spaces of the same finite 
dimension maps a basis e],..., €n of the space L toa basis f,,..., f,, of the space 
M, and every vector x € L with coordinates (a, ..., @,) in the first basis is mapped 
to the vector A(x) € M with the same coordinates relative to the second basis. This 
clearly follows from formula (3.44). 

Thus it is possible to define a nonsingular transformation A:L— M by stating 
that it maps a particular basis e1,..., @, of the space L into a basis f,,..., f, of the 
space M, and an arbitrary vector x € L with coordinates (a1,...,@,) with respect 
to the basis e;,..., @, into the vector of M with the same coordinates with respect 
to the basis f,,..., f,,. Later, we will make use of this method in the case L= M, 
when we will be studying certain special subsets X CL, primarily quadrics. The 
basic idea is that subsets X and Y are mapped into each other using a certain non- 
singular mapping A:L-— L (that is, Y = A(X)) if and only if there exist two bases 
€1,...,@, and f,,..., f, of the vector space L such that the condition of the vector 
x belonging to the subset X in coordinates relative to the basis e),..., @€, coincides 
with the condition of the same vector belonging to Y in coordinates relative to the 
basis f,,..-, fn- 

In conclusion, let us return once more to Theorem 1.12, proved in Sect. 1.2, and 
Corollary 1.13 (Fredholm alternative; see p. 11). This theorem and corollary are 
now completely obvious, obtained as trivial consequences of a more general result. 

Indeed, as we saw in Sect. 2.9, a system of n linear equations in n unknowns can 
be written in matrix form A[x]= [b], where A is a square matrix of order n, [x] is 
a column vector consisting of the unknowns x1,...,X,, and [b] is a column vector 
consisting of the constants b),...,b,. Let A:L— M be a linear transformation 
between vector spaces of the same dimension n, having for some bases e1,..., @n 
and f,,..., f,, the matrix A. Let b € M be the vector whose coordinates in the 
basis f,,..., f, are equal to bj,...,b,. Then we can interpret the linear system 
A[x] = [b] as equations 


A(x) =b (3.45) 


with the unknown vector x € L whose coordinates in the basis e1,...,@n give the 
solution (x1, ..., Xn) to this system. 

We have the following obvious alternative: Either the linear transformation 
“A :L— M is an isomorphism, or else it is not. By Theorem 3.70, the first case 
is equivalent to the mapping A being nonsingular. Then the kernel of A is equal to 
(0), and we have the image A(L) = M. Consequently, for an arbitrary vector b € M, 


118 3 Vector Spaces 


there exists (and indeed, it is unique) a vector x € L such that A(x) = BD, that is, 
equation (3.45) is solvable. In particular, from this we obtain Theorem 1.12 and its 
corollary. In the second case, the kernel of A contains a nontrivial vector (the asso- 
ciated homogeneous system has a nontrivial solution), and the image A(L) is not all 
of the space M, that is, there exists a vector b € M such that equation (3.45) is not 
satisfied (the system A[x] = [Db] is inconsistent). 

This assertion, that either equation (3.45) has a solution for every right-hand side 
or the associated homogeneous equation has a nontrivial solution, holds also in the 
case that “A is a linear transformation (operator) in an infinite-dimensional space 
satisfying a certain special condition. Such transformations occur in particular in 
the theory of integral equations, where this assertion is given the name Fredholm 
alternative. 


3.6 The Rank of a Linear Transformation 


In this section we shall look at linear transformations A :L— M without mak- 
ing any assumptions about the dimensions n and m of the spaces L and M except 
to assume that they are finite. We note that if e;,...,e, is any basis of the space 
L, then the image of A is equal to (A(e1),..., A(e,)). If we choose some basis 
JS ,--->fm Of the space M and write the matrix of the transformation A with re- 
spect to the chosen bases, then its columns will consist of the coordinates of the 
vectors A(e1),..., A(e,) in the bases f),..., f,,, and therefore, the dimension 
of the image of A is equal to the greatest number of linearly independent vectors 
among these columns, that is, the rank of the matrix of the linear transformation A. 
Thus the rank of the matrix of a linear transformation is independent of the bases 
in which it is written, and therefore, we may speak of the rank of a linear trans- 
formation. This allows us to give an equivalent definition of the rank of a linear 
transformation that does not depend on the choice of coordinates. 


Definition 3.71 The rank of a linear transformation A: L > M is the dimension of 
the vector space A(L). 


The following theorem establishes a connection between the rank of a linear 
transformation and the dimension of its kernel, and it shows a very simple form into 
which the matrix of a linear transformation A:L-— M can be brought through a 
suitable choice of bases of both spaces. 


Theorem 3.72 For any linear transformation A:L— M of finite-dimensional vec- 
tor spaces, the dimension of the kernel of A is equal to dimL—r, where r is the rank 
of A. In the two spaces, it is possible to choose bases in which the transformation 
“A has a matrix in block-diagonal form 


E, 0 
& ale (3.46) 


where E, is the identity matrix of order r. 


3.6 The Rank of a Linear Transformation 119 


Proof Let us denote the kernel of the transformation A by L’, and its image A(L) 
by M’. We begin by proving the relationship 


dimL’ + dimM’ = dimL. (3.47) 


By the definition of the rank of a transformation, we have here r = dim M’, and thus 
the equality (3.47) gives precisely the first assertion of the theorem. 

Let us consider the mapping A’: L— M’ that assigns to each vector x € L the 
vector y = A(x) in M’, which by assumption is the image of the mapping A: 
L — M. It is clear that such a mapping A’: L—> M’ is also a linear transformation. 
In view of Corollary 3.31, we have the decomposition 


Late’, (3.48) 


where L” is some subspace of L. We now consider the restriction of the transforma- 
tion A’ to the subspace L” and denote it by A”: L’ + M’. It is easily seen that the 
image of A” coincides with the image of A’, that is, is equal to M’. Indeed, since 
M’ is the image of the original mapping A: L— M, every vector y € M’ can be rep- 
resented in the form y = A(x) with some x € L. But in view of the decomposition 
(3.48), we have the equality x =u + v, where u € L’ and v € L”, and moreover, L’ 
is the kernel of .A, that is, A(u) = 0’. Consequently, A(x) = A(u) + A(v) = A(v), 
and this means that the vector y = A(v) is the image of the vector v € L”. 

The kernel of the transformation .4” : L’’ + M’ is equal to (0). Indeed, by defini- 
tion, the kernel is equal to L’ML’”, and this intersection consists solely of the null vec- 
tor, since on the right-hand side of the decomposition (3.48) is to be found a direct 
sum (see Corollary 3.15). As a result, we obtain that the image of the transformation 
A” :L” — M’ is equal to M’, while its kernel is equal to (0), that is, this transfor- 
mation is an isomorphism. By Theorem 3.64, it follows that dimL” = dimM’. On 
the other hand, from the decomposition (3.48) and Theorem 3.41, it follows that 
dimL’ + dimL” = dimL. Substituting here dimL” by the equal number dim M’, we 
obtain the required equality (3.47). 

We shall now prove the assertion of the theorem about bringing the matrix of a 
linear transformation A into the form (3.46). To this end, similar to the decompo- 
sition (3.48) of the space L, we make the decomposition M = M’ @ M’, where M” 
is some subspace of M. By the fact proved above that dimL’ =n — r and in view 
of (3.48), it follows that dimL” = r. Let us now choose in the subspace L” some 
basis uw ;,..., 4, and set v; = A” (u;), that is, by definition, v; = A(u;). As we have 
seen, the transformation A” : L’” — M’ is an isomorphism, and therefore, the vectors 
v1,..., 0; form a basis of the space M’, and moreover, in the bases u1,..., u, and 
V1,..., Vy, the transformation A” has the identity E, as its matrix. 

Let us now choose in the space L’ some basis u;+1,..., @, and combine it with 
the basis u1,..., 4, into the unified basis u1,...,u, of the space L. Similarly, we 
extend the basis v1,..., v- to an arbitrary basis v1, v2,..., Um of the space M. What 
will be the matrix of the linear transformation A in the constructed bases w1,..., Un 
and v1,...,U,? It is clear that A(u;) = v; for i = 1,...,7 (by construction, for 
these vectors, the transformation A” is the same as A). 


120 3 Vector Spaces 


On the other hand, A(u;) = 0’ fori =r+1,...,n, since the vectors U,41,...,Un 
are contained in the kernel of 4. Writing the coordinates of the vectors A(u1),..., 
A(u,) in the basis v1,..., Vm as the columns of a matrix, we obtain that the matrix 


of the transformation A has the block-diagonal form (3.46). 


Theorem 3.72 allows us to obtain a simpler and more natural proof of Theo- 
rem 2.63 from the previous section. 

To this end, we note that every matrix is the matrix of some linear transfor- 
mation of vector spaces of suitable dimensions, and in particular, a nonsingular 
square matrix represents an isomorphism of vector spaces of the same dimension. 
For the matrices A, B, and C of Theorem 2.63, let us consider the linear transfor- 
mations A:M>M’, 8:L’SL, and @:L— M, where dimL = dimL’ = n and 
dimM = dimM’ = m, having matrices A, B, and C in some bases. 

Let us find the rank of the transformation ACB : L' > M’. From the equalities 
o4(M) = M’ and B(L’) = L, it follows that AC B(L’) = A(C@(L)), whence taking into 
account the isomorphism “A, we obtain that dim AC B(L’) = dim C(L). By defini- 
tion, the dimension of the image of a linear transformation is equal to its rank, which 
coincides with the rank of its matrix, written in terms of arbitrary bases, from which 
it follows that rk ACB = rkC. From this, we finally obtain the required equality 
rk ACB =1kC. 

We would like to emphasize that the matrix of a transformation is reduced to the 
simple form (3.46) in the case that the spaces L and M are different from each other, 
and it follows that there is no possibility of coordinating their bases, and they are 
thus chosen independently of each other. We shall see below that in other cases (for 
example, if L= M), there is a more natural way of making this assignment when the 
bases of the spaces L and M are not chosen independently (for example, in the case 
L=M, it is simply one and the same basis). Then the question of the simplest form 
of the matrix of a transformation becomes much more complex. 

The statement of Theorem 3.72 on bringing the matrix of a linear transformation 
into the form (3.46) can be reformulated. As we established in Sect. 3.4 (substitution 
formula (3.41)), under a change of bases in the spaces L and M, the matrix A of a 
linear transformation A : L > M is replaced by the matrix A’ = D~! AC, where C 
and D are the transition matrices for the new bases in the spaces L and M. We know 
that the matrices C and D are nonsingular, and conversely, any nonsingular square 
matrix of the appropriate order can be taken as the transition matrix to a new basis. 
Therefore, Theorem 3.72 yields the following corollary. 


Corollary 3.73 For every matrix A of type (m,n), there exist nonsingular square 
matrices C and D of orders n and m such that the matrix D~' AC has the form 
(3.46). 


3.7 Dual Spaces 


In this section, we shall examine the notion of a linear transformation A:L— Min 
the simplest case of dimM = 1. As a result, we shall arrive at a concept very close 


3.7. Dual Spaces 121 


to that with which we began our course in Sect. 1.1, but now reformulated more 
abstractly, in terms of vector spaces. If dimM = 1, then after selecting a basis in M 
(that is, some nonzero vector e), we can express any vector in this space in the form 
ae, where @ is a scalar (real, complex, or from an arbitrary field K, depending on the 
interpretation that the reader wishes to give to this term). Identifying we with a, we 
may consider in place of M the collection of scalars (R, C, or K). In connection with 
this, we shall in this case denote the vector space £(L, M) introduced in Sect. 3.3 by 
L(L, K). It is called the space of linear functions on L. 

Therefore, a linear function on a space L is a mapping f :L— K that assigns to 
each vector x € L the number f(x) and satisfies the conditions 


faty=fat+ fo), flax) =a f(x) 


for all vectors x, y € Land scalars a € K. 


Example 3.74 If L =k” is the space of rows of length n with elements in the field 
K, then the notion of linear function introduced above coincides with the concept 
introduced in Sect. 1.1. 


Example 3.75 Let L be the space of continuous functions on the interval [a, b] tak- 
ing real or complex values. For every function x(f) in L, we set 


b 
f(x) = / g(t)x(t)dt, (3.49) 


where g(t) is some fixed function in L. It is clear that f Ps (x) is a linear function on L. 
We observe that in going through all functions g(t), we shall obtain by formula 
(3.49) an infinite number of linear functions on L, that is, elements of the space 
LL, K), where K = R or C. However, it is not possible to obtain all linear functions 
on L with the help of formula (3.49). For example, let s € [a, b] be some fixed point 
on this interval. Consider the mapping L — K that assigns to each function x(t) € L 
its value at the point s. It is then clear that such a mapping is a linear function on L, 
but it is represented in the form (3.49) for no function g(f). 


Definition 3.76 If L is finite-dimensional, the space L(L, K) is called the dual to L 
and is denoted by L*. 


Remark 3.77 (The infinite-dimensional case) For an infinite-dimensional vector 
space L (for example, that considered in Example 3.75 of the space of continu- 
ous functions on an interval), the dual space L* is defined to be the space not of all 
linear functions, but only of those satisfying the particular additional condition of 
continuity (in the case of a finite-dimensional space, the requirement of continuity 
is automatically satisfied). 

The study of linear functions on infinite-dimensional vector spaces turns out to 
be useful in many questions in analysis and mathematical physics. In this direction, 
the remarkable idea arose to treat arbitrary linear functions as if they had been given 


122 3 Vector Spaces 


in the form (3.49), where @(f) is a certain “generalized function” that does not, in 
general, belong to the initial space L. This leads to new and interesting results. 

For example, if we take as L the space of functions that are differentiable on the 
interval [a, b] and equal to zero at the endpoints, then for a differentiable function 
g(t), the rule of integration by parts can be written in the form f g! (x)=—-—f git ). 
But if the derivative g’(t) does not exist, then it is possible to define a new, “general- 
ized,” function w(t) by fy(~)=—fg (x’). In this case, it is clear that w(t) = g(t) 
if the derivative g’(t) exists and is continuous. Thus it is possible to define deriva- 
tives of arbitrary functions (including discontinuous and even generalized func- 
tions). 

For example, let us suppose that our interval [a, b] contains in its interior the 
point 0 and let us calculate the derivative of the function h(t) that is equal to zero 
for t < 0 and to | for t > 0, and consequently has a discontinuity at the point t = 0. 
By definition, for any function x(t) in L, we obtain the equality 


b 


b 
fi) =—f a(x’) --| nox’ @dr=— | x'(t)dt =x(0) —x(b) =x(0), 


since x(b) = 0. Consequently, the derivative h’(t) is a generalized function® that 
assigns to each function x(t) in L its value at the point t = 0. 


We now return to exclusive consideration of the finite-dimensional case. 


Theorem 3.78 [fa vector space L is of finite dimension, then the dual space L* has 
the same dimension. 


Proof Let e1,...,@n be any basis of the space L. Let us consider vectors f; € L*, 
i=1,...,n, where f; is defined as a linear function that assigns to a vector 


xX =aje) +02€@2 +---+Qnen (3.50) 
its ith coordinate in the basis e;,..., €,,, that is, 
f,i@)=a1, or fy) =O. (3.51) 


We will thus obtain 7 vectors in the dual space. Let us verify that they form a basis 
of that space. 

Let f= 6if; +---+Bnf,. Then applying the function f to the vector x, 
defined by the formula (3.50), we obtain 


f(x) =a Bi + a2B2 +--++0n Bn. (3.52) 


Such a generalized function is called a Dirac delta function in honor of the English physicist Paul 
Adrien Maurice Dirac, who was the first to use generalized functions (toward the end of the 1920s) 
in his work on quantum mechanics. 


3.7. Dual Spaces 123 


In particular, assuming x = e;, we obtain that f(e;) = 6;. Thus the equality f =0 
(where 0 is the null vector of the space L*, that is, a linear function on L identically 
equal to zero) means that f(x) = 0 for every vector x € L. It is clear that this is 
the case if and only if 6; = 0, ..., 6, =0. By this we have established the linear 


independence of the functions f;,..., f,,. By equality (3.52), every linear function 
on L can be expressed in the form 6; f; +---+ 6, f,, with coefficients 6; = f(e;). 
This means that the functions f,,..., f,, form a basis of L*, from which it follows 


that dimL = dimL* =n. 


The basis f},..., f,, of the dual space L* constructed according to formula 
(3.51) is called the dual to the basis e),...,e@, of the original vector space L. It 
is clear that it is defined by the formula 


fie) =1, f,(ej)=0 for j Ai. 


We observe that L and L*, like any two finite-dimensional vector spaces of the 
same dimension, are isomorphic. (For infinite-dimensional vector spaces, this is not 
in general the case, as in the case examined in Example 3.75 of the space L of con- 
tinuous functions on an interval, for which L and L* are not isomorphic.) However, 
the construction of an isomorphism between them requires the choice of a basis 
é),...,@, inLanda basis f;, fx,..., f,, in L*. Thus between L and L* there does 
not exist a “natural” isomorphism independent of the choice of basis. If we repeat 
the process of passage to the dual space twice, we will obtain the space (L*)*, for 
which it is easy to construct an isomorphism with the original space L without re- 
sorting to the choice of a special basis. The space (L*)* is called the second dual 
space to L and is denoted by L**. 

Our immediate objective is to define a linear transformation A :L— L** that is 
an isomorphism. To do so, we need to define A(x) for every vector x € L. The vector 
“A(x) must lie in the space L**, that is, it must be a linear function on the space L*. 
Since A(x) is an element of the second dual space L**, it follows by definition that 
“A(x) is a linear transformation that assigns to each element f € L* (which itself 
is a linear function on L) some number, denoted by A(x)(f). We will define this 
number by the natural condition 


A(x)(f)= f(x) forallxeLl, fev". (3.53) 


The transformation A is in £(L,L**) (its linearity is obvious). To verify that A 
is a bijection, we can use any basis e1,...,é@, in L and the dual basis f,,..., f, 
in L*. Then, as is easy to verify, A is the composition of two isomorphisms: the 
isomorphism L => L* constructed in the proof of Theorem 3.78 and the analogous 
isomorphism L* + L**, whence it follows that is itself an isomorphism. 

The isomorphism L> L** determined by condition (3.53) shows that the vector 
spaces L and L* play symmetric roles: each of them is the dual of the other. To point 
out this symmetry more clearly, we shall find it convenient to write the value f(x), 
whereby x € Land f €L*, in the form (x, f). The expression (x, f) possesses the 
following easily verified properties: 


124 3 Vector Spaces 


-O14+42,N=0,/I4+02.P; 
(«fi t+ fo) =, fi) +, fra); 

- (ax, fy =a(x, f); 

- w,af)=a(x, f); 

. if (x, f) =0 for all x €L, then f = 0; 
. if (x, f) =0 for all f €L*, then x = 0. 


NNKWN 


Conversely, if for two vector spaces L and M, the function (x, y) is defined, where 
x € Land y €M, taking numeric values and satisfying conditions (1)-(6), then as is 
easily verified, L~ M* and M ~~ L*. We shall rely heavily on this fact in Chap. 6 in 
our study of bilinear forms. 


Definition 3.79 Let L’ be a subspace of the vector space L. The set of all f € L* 
such that f(x) = 0 for all x € L’ is called the annihilator of the subspace L’ and is 
denoted by (L’)*. 


It follows at once from this definition that (L’)“ is a subspace of L*. Let us deter- 
mine its dimension. Let dimL =n and dimL’ = r. We choose a basis e,..., €; of 
the subspace L’, extend it to a basis e1,..., €n of the entire space L, and consider the 
dual basis f,,..., f,, of L*. From the definition of the dual basis, it follows easily 
that a linear function f € L* belongs to (L’)* if and only if f € (f,4),---,f,)- In 
other words (L’)* = (f,41,..., f,), and this implies that 


dim(L’)* = dimL — dimL’. (3.54) 


If we now consider the natural isomorphism L** > L defined above and with its 
help identify these spaces, then it is possible to apply the construction given above 
to the annihilator (L’)“ and examine the obtained subspace ((L’)“)* in L. From the 
definition, it follows that L’ Cc ((L’)“)*. From the derived relationship (3.54) for 
dimension, we obtain that dim((L’)“)* =n — (n — rr) =r, and by Theorem 3.24, it 
follows that ((L’)“)? =L’. 

At the same time, we obtain that the subspace L’ consists of all vectors x € L for 
which 


frai®)=0, 0, f(x) =0. (3.55) 


Thus an arbitrary subspace L’ is defined by some system of linear equations (3.55). 
This fact is well known in the case of lines and planes (dimL = 1, 2) in three- 
dimensional space from courses in analytic geometry. In the general case, this as- 
sertion is the converse of what was proved in Example 3.8 (p. 84). 

We have defined the correspondence L’ +> (L’)“ between subspaces L’ C L and 
(L')* C L*, which in view of the equality ((L’)“)“ = L’ is a bijection. We shall denote 
this correspondence by ¢ and call it duality. Let us now point out some simple 
properties of this correspondence. 

If L’ and L” are two subspaces of L, then 


e(L' +L") =e(L’) Ne(L"). (3.56) 


3.7. Dual Spaces 125 


In other words, this means that 
(L'+L")" = (L') 9 (L")*. (3.57) 


Indeed, let f € (L’)“ N (L”)*. By the definition of sum, for every vector x € L’ + L” 
we obtain the representation x = x’ + x”, where x’ € L’ and x” € L”, whence it fol- 
lows that f(x) = f (x!) + f(x”) =0, since f € (L’)% and f € (L”)*. Consequently, 
f ¢(U +L")%, and thus we have proved the inclusion (L’)? N (L’)? c (L' +L”)*. 
Let us now prove the reverse inclusion. Let f € (L’ +L”), that is, f(x) = 0 for 
all vectors x = x’ + x”, where x’ € L’ and x” € L”; in particular, for all vectors 
in both subspaces L’ and L”, that is, by the definition of the annihilator, we ob- 
tain the relationship f € (L’)“ and f © (L’)*. Thus f € (L’)* N (L”)%, that is, 
(L' +L”)? c (VU) 9 (L”)*. From this, by the previous inclusion, we obtain rela- 
tionship (3.57), and hence the relationship (3.56). 

As a result, we may formulate the following almost obvious duality principle. 
Later, we shall prove deeper versions of this principle. 


Proposition 3.80 (Duality principle) [ffor all vector spaces of a given finite dimen- 
sion n over a given field K, a theorem is proven in whose formulation there appear 
only the notions of subspace, dimension, sum, and intersection, then for all such 
spaces, a dual theorem holds, obtained from the initial theorem via the following 
substitution: 


dimension r dimension n —r 
intersection L'Q L” sum L! +L” 
sum L! +L” intersection L' QL” 


Finally, we shall examine the linear transformation A:L-— M. Here, as with 
all functions, linear functions are written in reverse order to the order of the sets 
on which they are defined; see p. xv in the Introduction. Using the notation of that 
section, we define the set T = K and restrict the mapping §(M, K) > §(L, K) con- 
structed there to the subset M* C §(M, K), the space of linear functions on M. We 
observe that the image M* is contained in the space L* C §(L, K), that is, it consists 
of linear functions on L. We shall denote this mapping by A*. According to the def- 
inition on page xv, we define a linear transformation A* : M* — L* by determining, 
for each vector g € M*, its value from the equality 


(A*(g))(x) = g(A(x)) forall x EL. (3.58) 


A trivial verification shows that A*(g) is a linear function on L, and A* is a linear 
transformation of M* to L*. The transformation A* thus constructed is called the 
dual transformation of A. Using our earlier notation to write f(x) as (x, f), we 
can write the definition (3.58) in the following form: 


(A*(y),x) =(y, A(x)) forall x ¢ Land yeM*. 


Let us choose in the space L some basis e;,...,é@,, andin M, a basis f,,..., fim 
* * 4 * * * os * 
and also dual bases e7,...,e;, in L* and f7,..., fj, in M*. 


126 3 Vector Spaces 


Theorem 3.81 The matrix of a transformation A:L— M written in terms of ar- 
bitrary bases of the spaces L and M and the matrix of the dual transformation 
“A* : M* — L* written in the dual bases in the spaces M* and L* are transposes 
of each other. 


Proof Let A = (a;;) be the matrix of the transformation A in the bases e],..., €n 
and f,,..., f,,- By formula (3.23), this means that 


A(ei) =o ajif j. i=1,...,n. (3.59) 


j=l 


By the definition of the dual transformation (formula (3.58)), for every linear func- 
tion f € L*, the following equality holds: 


(A*(f))(e:) = f(A), i=1,...,n. 


If ej, ...,e% is the basis of L* dual to the basis e1,...,@n of L, and fj,..., f, is 
the basis of M* dual to the basis f;,..., f,, of M, then A (ff) is a linear function 
on L, as defined in (3.58). In particular, applying A*(f;) to the vector e; € L, taking 
into account (3.58) and (3.59), we obtain 


m m 
(A*( fi) (e:) = FE (Alei)) = (vi Sans) =) atid): 
j=l j=l 
and this number is equal to ax; by the definition of the dual basis. It is obvious 


that this linear function on L is the function ae 1 aie? « Thus we obtain that the 
transformation A* assigns the vector f7; € M* to the vector 


n 
A*(f}) =) lagel, k=1,...,m, (3.60) 
i=l 


of the space L*. Comparing formulas (3.59) and (3.60), we conclude that in the 
given bases, the matrix of the transformation A* is equal to A* = (a ji)» that is, the 
transpose of the matrix of the transformation A. 


If we are given two linear transformations of vector spaces, A:L— Mand 8: 
M— N, then we can define their composition 8A :L— N, which means that its 
dual transformation is also defined, and is given by (@A)* : N* > L*. From the 
condition (3.58), an immediate verification easily leads to the relation 


(BA)* = A*B™. (3.61) 


Together with Theorem 3.81, we thus obtain a new proof of equality (2.57), and 
moreover, now no formulas are used; relationship (2.57) is obtained on the basis of 
general notions. 


3.8 Forms and Polynomials in Vectors 127 
3.8 Forms and Polynomials in Vectors 


A natural generalization of the concept of linear function on a vector space is the 
notion of form. It plays an important role in many branches of mathematics and in 
mechanics and physics. 

In the sequel, we shall assume that the vector space L on which we want to 
define a form is defined over an arbitrary field K. In the space L, we choose a basis 
€1,...,@n. Then every vector x € L is uniquely defined by the choice of coordinates 
(x1,.-.,X,) in the given basis. 


Definition 3.82 A function F : L > K is called a polynomial on the space L if F(x) 
can be written as a polynomial in the coordinates x;,..., x, of the vector x, that is, 
F(x) is a finite sum of expressions of the form 


ee ae (3.62) 
where ki, ..., k, are nonnegative integers and the coefficient c is in K. The expres- 


sion (3.62) is called a monomial in the space L, while the number k = kj +---+ky 
is called its degree. The degree of F(x) is the maximum over the degrees of the 
monomials that enter into it with nonzero coefficients c. 


Let us note that for n > 1, a polynomial F(x) of degree k can have several differ- 
ent monomials (3.62) of the same degree entering into it with nonzero coefficients c. 


Definition 3.83 A polynomial F(x) on a vector space L is said to be homogeneous 
of degree k or a form of degree k (or frequently k-form) if every monomial entering 
into F(x) with nonzero coefficients is of degree k. 


The definitions we have given require a bit of comment; indeed, we introduced 
them having chosen a particular basis of the space L, and now we need to show that 
everything remains as defined under a change of basis; that is, if the function F(x) is 
a polynomial (or form) in the coordinates of the vector x in one basis, then it should 
be a polynomial (or form) of the same degree in the coordinates of the vector x in 
any other basis. Indeed, using the formula for changing the coordinates of a vec- 
tor, that is, substituting relationships (3.35) into (3.62), it is easily seen that under a 
change of basis, every monomial (3.62) of degree k is converted to a sum of mono- 
mials of the same degree. Consequently, a change of basis transforms the monomial 
(3.62) of degree k into a certain form F’(x) of degree k’ < k. The reason for the 
inequality here is that monomials entering into this form might cancel, resulting in a 
leading-degree term that is equal to zero. However, it is easy to see that such cannot 
occur. For example, using back-substitution, that is, substituting relationship (3.37) 
into the form F’(x), we will clearly again obtain the monomial (3.62). Therefore, 
k <k’. Thus we have established the equality k’ = k. This establishes everything 
that we needed to prove. 

Forms of degree k = 0 are simply the constant functions, which assign to every 
vector x € L one and the same number. Forms of degree k = | are said to be linear, 


128 3 Vector Spaces 


and these are precisely the linear functions on the space L that we studied in detail 
in the previous section. 

Forms of degree k = 2 are called quadratic; they play an especially important 
role in courses in linear algebra as well as in many other branches of mathematics 
and physics. In our course, an entire chapter will be devoted to quadratic forms 
(Chap. 6). 

We observe that we have in fact already encountered forms of arbitrary degree, 
as shown in the following example. 


Example 3.84 Let F(x1,...,%m) be a multilinear function on m rows of length n 
(see the definition on p. 51). Since the space K” of rows of length n is isomorphic 
to every n-dimensional vector space, we may view F(x1,...,Xm) as a multilinear 
function in m vectors of the space L. Setting all the vectors x1, ..., Xm in L equal to 
x, then by Theorem 2.29, we obtain on the space L the form F (x) = F(x,...,x) of 
degree m. 


Let us denote by F;(x) the sum of all monomials of degree k > 0 appearing in 
the polynomial F(x) for a given choice of basis e),...,@,. Thus Fy (x) is a form of 
degree k, and we obtain the expression 


F(x) = Fot+ Fi(x) +---+ Fn (x), (3.63) 


in which Fy (x) = 0 if there are no terms of degree k. For every form F; (x) of degree 
k, the equation 


Fy(Ax) = AK Fy (x) (3.64) 


is satisfied for every scalar A € K and every vector x € L (clearly, it suffices to verify 
(3.64) for a monomial). Substituting in relation (3.63) the vector Ax in place of x, 
we obtain 


F(Ax) = Fo+AF\(x) +-+- +A” Fin (Xx). (3.65) 


From this, it follows easily that the forms F; in the representation (3.63) are uniquely 
determined by the polynomial F. 

It is not difficult to see that the totality of all polynomials on the space L form a 
vector space, which we shall denote by A. This notation is connected with the fact 
that the totality of all polynomials forms not only a vector space, but a richer and 
more complex algebraic structure called an algebra. This means that in addition to 
the operations of a vector space, in A is also defined the operation of the product 
of every pair of elements satisfying certain conditions; see the definition on p. 370. 
However, we shall not yet use this fact and will continue to view A solely as a vector 
space. 

Let us note that the space A is infinite-dimensional. Indeed, it suffices to consider 
the infinite sequence of forms Fy (x) = ie, where k runs through the natural num- 
bers, and the form F;,(x) assigns to a vector x with coordinates (x1, ...,%»,) the kth 
power of its ith coordinate (the number i may be fixed). 


3.8 Forms and Polynomials in Vectors 129 


The totality of forms of fixed degree k on a space L forms a subspace A, C A. 
Here Ap = K, and A; coincides with the space L* of linear functions on L. The 
decomposition (3.63) could be interpreted as a decomposition of the space A as the 
direct sum of an infinite number of subspaces A; (k = 0, 1, ...) if we were to define 
such a notion. In the field of algebra, the accepted name for this is graded algebra. 

In the remainder of this section we shall look at two examples that use the con- 
cepts just introduced. Here we shall use the rules for differentiating functions of 
several variables (as applied to polynomials), which is something that might be new 
to some readers. However, reference to the formulas thus obtained will occur only 
at isolated places in the course, which can be omitted if desired. We present these 
arguments only to emphasize the connection with other areas of mathematics. 

Let us begin with reasoning that uses a certain coordinate system, that is, a choice 
of some basis in the space L. For the polynomial F(x1,...,X,), its partial deriva- 
tives are defined by 0 F'/0x;, which are again polynomials. It is easy to see that the 
mapping that assigns to every polynomial F ¢€ A the polynomial 0 F'/dx; determines 
a linear transformation A — A, which we denote by 0/0x;. From these transforma- 
tions we obtain new transformations A — A of the form 


n 
a 
=> Fh (3.66) 
i=l 


, 
Ox; 


where the P; are arbitrary polynomials. Linear transformations of the form (3.66) 
are called first-order differential operators. In analysis and geometry one considers 
their analogues, whereby the P; are functions of a much more general class and the 
space A is correspondingly enlarged. From the simplest properties of differentiation, 
it follows that the linear operators D defined by formula (3.66) exhibit the property 


D(FG)=FD(G)+ GDF) (3.67) 


forall Fe AandGeA. 

Let us show that the converse also holds: an arbitrary linear transformation D : 
A —> A satisfying condition (3.67) is a first-order differential operator. To this end, 
we observe first that from the relation (3.67), it follows that O(1) = 0. Indeed, 
setting in (3.67) the polynomial F = 1, we obtain the equality O(UG) = 1D(G) + 
GM(1). Canceling the term O(G) on the left- and right-hand sides, we see that 
GD(1) = 0, and having selected as G an arbitrary nonzero polynomial (even if 
only G = 1), we obtain D(1) =0. 

Let us now determine a linear transformation D’ : A > A according to the for- 
mula 


n 
) 
D'=D-) Pin where P; = D(x;). 
i=l ! 


It is easily seen that O’(1) =0 and D’(x;) =0 for all indices i = 1,...,n. We ob- 
serve as well that the transformation D’, like D, satisfies the relationship (3.67), 


130 3 Vector Spaces 


whence it follows that if D(F) = 0 and O(G) = 0, then also D(FG) = 0. There- 
fore, D'(F) = 0 if the polynomial F is the product of any two monomials from the 
collection 1,x1,...,X,. It is obvious that into the collection of such polynomials 
enter all monomials of degree two, and consequently, for them we have D’(F) = 

Proceeding by induction, we can show that D’(F) = 0 for all monomials in A; 
for all k, and therefore, this holds in general for all forms F; € Ax. Finally, we recall 
that an arbitrary polynomial F ¢€ A is the sum of a finite number of homogeneous 
polynomials Fy, € Ay. Therefore, D’(F) = 0 for all F € A, which means that the 
transformation D has the form (3.66). 

The relationship (3.67) gives the definition of a first-order differential operator in 
a way that does not depend on the coordinate system, that is, on the choice of basis 
€1,..-,@, Of the space L. 


Example 3.85 Let us consider the differential operator 


fel F) 
D=) xia 


i=l 


It is clear that D(x) = x; for alli =1,...,n, from which it follows that for the 
restriction to the subspace A; C A, the linear transformation D:A 1 — A, becomes 
the identity, that is, equal to €. We shall prove that for the restriction to the subspace 
Ax CA, the transformation D: Ax — Ag coincides with k&. We shall proceed by 
induction on k. We have already analyzed the case k = 1, and the case k = 0 is 
obvious. Consider now polynomials x;G, where G € Ag_ ie andi = 1,...,n. Then 
from (3.67), we have the equality Dxi G)= xj D(G)- + GD(xj). We ave seen that 
D(x) = = x;, and by induction, we may assume that D(G)= = (k — 1)G. As a result, 
we obtain the equality 


D(xjG) =x; (k — 1)G + Gx; =kxjG. 


But every polynomial F € A; can be written as the sum of polynomials of the form 
x;G; with suitable G; € Ax—1. Thus for an arbitrary polynomial F € Ax, we obtain 
the relationship O(F') =k F. Written in coordinates, this takes the form 


n 


OF 
oxi =kF, F EAg, (3.68) 
ia OX 


and is called Euler’s identity. 


Example 3.86 Let F(x) be an arbitrary polynomial on the vector space L. For a 
variable t € R and fixed vector x € L, the function F (tx), in view of relationships 
(3.63) and (3.64), is a polynomial in the variable t. The expression 


d 
(do F)(x) = He (3.69) 
t=0 


3.8 Forms and Polynomials in Vectors 131 


is called the differential of the function F(x) at the point 0. Let us point out that on 
the right-hand side of equality (3.69) can be found the ordinary derivative of F (tx) 
as a function of the variable ¢ € R at the point t = 0. On the left-hand side of the 
equality (3.69) and in the expression “differential of the function at the point 0,” the 
symbol 0 signifies, as usual, the null vector of the space L. 

Let us now verify that (dg F)(x) is a linear function in x. To this end, we use 
equality (3.65) for the polynomial F (tx). From the relationship 


F(tx) = Fo+tF\(x)+---+1t" F(x), 


we obtain immediately that 


d 
—Fi(t =F 
a (tx) oe 1(x), 


where F}(x) is a linear function on L. Thus in the decomposition (3.63) for the 
polynomial F(x), for the second term, Fj(x) = (dgF)(x), and therefore doF is 
frequently called the linear part of the polynomial F. 

We shall give an expression in coordinates for this important function. Using the 
rules of differentiation for a function of several variables, we obtain 


n 


< F(x) =) st ye = D5 Satta 


i=1 


Setting t = 0, we obtain from this formula 


n 


OF 
(doF (x) =) 5 — Oxi. (3.70) 


The coordinate representation (3.70) for the differential is quite convenient, but it 
requires the selection of a basis e1,...,@, in the space L and the notation x = 
x1@;+---+x,e,. The expression (3.69) alone shows that (do F)(x) does not depend 
on the choice of basis. In analysis, both expressions (3.69) and (3.70) are defined 
for functions of a much more general class than polynomials. 

We note that for polynomials F(x1,...,Xn) = xj, we obtain with the help of 
formula (3.70) the expression (do F')(x) = x;. This indicates that the functions 
(dox1),..., (doxn) form a basis of L* dual to the basis e1,..., @, of L. 


Chapter 4 
Linear Transformations of a Vector Space 
to Itself 


4.1 Eigenvectors and Invariant Subspaces 


In the previous chapter we introduced the notion of a linear transformation of a 
vector space L into a vector space M. In this and the following chapters, we shall 
consider the important special case in which M coincides with L, which in this book 
will always be assumed to be finite-dimensional. Then a linear transformation A : 
L — L will be called a linear transformation of the space L to itself, or simply a 
linear transformation of the space L. This case is of great importance, since it is 
encountered frequently in various fields of mathematics, mechanics, and physics. 
We now recall some previously introduced facts regarding this case. First of all, 
as before, we shall understand the term number or scalar in the broadest possible 
sense, namely as a real or complex number or indeed as an element of any field K 
(of the reader’s choosing). 

As established in the preceding chapter, to represent a transformation A by a 
matrix, one has to choose a basis e1,...,@, of the space L and then to write the 
coordinates of the vectors A(e1),..., “(e,) in terms of that basis as the columns 
of a matrix. The result will be a square matrix A of order n. If the transforma- 
tion A of the space L is nonsingular, then the vectors A(e1),..., A(e,) themselves 
form a basis of the space L, and we may interpret A as a transition matrix from 
the basis e1,..., @n to the basis A(e1),..., A(e,). A nonsingular transformation A 
obviously has an inverse, AW! with matrix A7!. 


Example 4.1 Let us write down the matrix of the linear transformation A that acts 
by rotating the plane in the counterclockwise direction about the origin through the 
angle a. To do so, we first choose a basis consisting of two mutually perpendicular 
vectors e; and e» of unit length in the plane, where the vector e2 is obtained from 
e, by acounterclockwise rotation through a right angle (see Fig. 4.1). 

Then it is easy to see that we obtain the relationship 


A(e,) =cosae, + sinawea, A(e2) = — sinwe, +cosa@ez, 


LR. Shafarevich, A.O. Remizov, Linear Algebra and Geometry, 133 
DOI 10.1007/978-3-642-30994-6_4, © Springer-Verlag Berlin Heidelberg 2013 


134 4 Linear Transformations of a Vector Space to Itself 


Fig. 4.1 Rotation through 
the angle a e2 


(aT 


and it follows from the definition that the matrix of the transformation A in the 


given basis is equal to 
A= (ee paid: (4.1) 
sina cosa 


Example 4.2 Consider the linear transformation A of the complex plane that con- 
sists in multiplying each number z € C by a given fixed complex number p + ig 
(here i is the imaginary unit). 

If we consider the complex plane as a vector space L over the field C, then it is 
clear that in an arbitrary basis of the space L, such a transformation -A has a matrix of 
order |, consisting of a unique element, namely the given complex number p + iq. 
Thus in this case, we have dimL = 1, and we need to choose in L a basis consisting 
of an arbitrary nonzero vector in L, that is, an arbitrary complex number z 4 0. Thus 
we obtain A(z) = (p +iq)z. 


Now let us consider the complex plane as a vector space L over the field R. In 
this case, dim L = 2, since every complex number z = x + iy is represented by a pair 
of real numbers x and y. Let us choose in L the same basis as in Example 4.1. Now 
we choose the vector e; lying on the real axis, and the vector e2 on the imaginary 
axis. From the equation 


(x +iy)(p + iq) = (px — gy) +i(py + 4x) 
it follows that 
A(e;) = pe + geo, A(e2) = —qe, + peo, 


from which it follows by definition that the matrix of the transformation A in the 


given basis takes the form 
ees (4.2) 


In the case |p +iq| = 1, we may put p =cosa and q = sina for a certain number 
0 <a < 2m (such an a is called the argument of the complex number p + iq). Then 
the matrix (4.2) coincides with (4.1); that is, multiplication by a complex number 
with modulus | and argument a is equivalent to the counterclockwise rotation about 
the origin of the complex plane through the angle a. We note that every complex 
number p + iq can be expressed as the product of a real number r and a complex 


4.1 Eigenvectors and Invariant Subspaces 135 


number of modulus 1; that is, p + ig =r(p’ +iq’), where |p’ +ig’|=1 andr= 
|p +igq|. From this it is clear that multiplication by p + ig is the product of two 
linear transformations of the complex plane: a rotation through the angle a and a 
dilation (or contraction) by the factor r. 

In Sect. 3.4, we established that in the transition from a basis e),...,@, of the 
space L to some other basis e/, ..., e/,, the matrix of the transformation is changed 
according to the formula 


A’=C7!AC, (4.3) 


where C is the transition matrix from the second basis to the first. 


Definition 4.3 Two square matrices A and A’ related by (4.3), where C is any 
nonsingular matrix, are said to be similar. 


It is not difficult to see that in the set of square matrices of a given order, the sim- 
ilarity relation thus defined is an equivalence relation (see the definition on p. xii). 

It follows from formula (4.3) that in changing bases, the determinant of the trans- 
formation matrix does not change, and therefore it is possible to speak not simply 
about the determinant of the transformation matrix, but about the determinant of the 
linear transformation A itself, which will be denoted by |-A|. A linear transforma- 
tion A:L— L is nonsingular if and only if |.A| 4 0. If L is a real space, then this 
number |.A| 4 0 is also real and can be either positive or negative. 


Definition 4.4 A nonsingular linear transformation A :L— L of the real space L is 
called proper if | A| > 0, and improper if |.A| <0. 


One of the basic tasks in the theory of linear transformations, one with which 
we shall be occupied in the sequel, is to find, given a linear transformation of a 
vector space into itself, a basis for which the matrix of the transformation takes the 
simplest possible form. An equivalent formulation of this task is for a given square 
matrix to find the simplest matrix that is similar to it. Having such a basis (or similar 
matrix) gives us the possibility of surveying a number of important properties of the 
initial linear transformation (or matrix). In its most general form, this problem will 
be solved in Chap. 5, but at present, we shall examine it for a particular type of 
linear transformation that is most frequently encountered. 


Definition 4.5 A subspace L’ of a vector space L is called invariant with respect to 
the linear transformation A: L— L if for every vector x € L’, we have A(x) €L’. 


It is clear that according to this definition, the zero subspace (0) and the entire 
space L are invariant with respect to any linear transformation A:L— L. Thus 
whenever we enumerate the invariant subspaces of a space L, we shall always mean 
the subspaces L’ C L other than (0) and L. 


Example 4.6 Let L be the three-dimensional space studied in courses in analytic 
geometry consisting of vectors originating at a given fixed point O, and consider the 


136 4 Linear Transformations of a Vector Space to Itself 


transformation A that reflects each vector with respect to a given plane L’ passing 
through the point O. It is then easy to see that A has two invariant subspaces: the 
plane L’ itself and the straight line L” passing through O and perpendicular to L’. 


Example 4.7 Let L be the same space as in the previous example, and now let the 
transformation A be a rotation through the angle a, 0 < a < 7, about a given axis 
L’ passing through O. Then -A has two invariant subspaces: the line L’ itself and the 
plane L” perpendicular to L’ and passing through O. 


Example 4.8 Let L be the same as in the previous example, and let A be a homo- 
thety, that is, “A acts by multiplying each vector by a fixed number a ¥ 0. Then it 
is easy to see that every line and every plane passing through O is an invariant sub- 
space with respect to the transformation A. Moreover, it is not difficult to observe 
that if 4 is a homothety on an arbitrary vector space L, then every subspace of L is 
invariant. 


Example 4.9 Let L be the plane consisting of all vectors originating at some point 
O, and let -A be the transformation that rotates a vector about O through the angle a, 
0 <a <z. Then A has no invariant subspace. 


It is evident that the restriction of a linear transformation A to an invariant sub- 
space L’ CL is a linear transformation of L’ into itself. We shall denote this trans- 
formation by A’, that is, A’ :L’ > L’ and A’/(x) = A(x) for all x EL’. 

Let e),...,@m be a basis of the subspace L’. Then since it consists of linearly 
independent vectors, it is possible to extend it to a basis e;,...,e@, of the entire 
space L. Let us examine how the matrix of the linear transformation appears in 
this basis. The vectors A(e1),..., A(e@,) are expressed as a linear combination of 
€1,..-,m; this is equivalent to saying that e), ..., @ is the basis of a subspace that 
is invariant with respect to the transformation A. We therefore obtain the system of 
equations 


A(e,) = ae] +a21€2 +- +> + amiem, 
A(e2) = aj2e@1 + a22€2 + +++ + 4m2€m, 


A(€m) = Aime] + d2me2 + +++ + Anmem- 


It is clear that the matrix 


411 412 +++) Alm 
: 421 422 *': Adm 
A= ; . : ; (4.4) 
Gm1 Gm2 °°: GAmm 
is the matrix of the linear transformation A’: L’ > L’ in the basis e),...,@m. In 


general, we can say nothing about the vectors A(e;) for i > m except that they are 


4.1 Eigenvectors and Invariant Subspaces 137 


linear combinations of vectors from the basis e;,..., é, of the entire space L. How- 
ever, we shall represent this by separating out terms that are multiples of e1, ..., @m 
(we shall write the associated coefficients as b;;) and those that are multiples of the 
vectors €m+41,-..,@n (here we shall write the associated coefficients as cj;). As a 


result we obtain the matrix 
A’ B’ 
A= ( 0 a) , (4.5) 


where B’ is a matrix of type (m,n — m), C’ is a square matrix of order n — m, and 
0 is a matrix of type (n — m, m) all of whose elements are equal to zero. 

If it turns out to be possible to find an invariant subspace L” related to the invari- 
ant subspace L’ by L=L’ @ L”, then by joining the bases of L’ and L”, we obtain 
a basis for the space L in which the matrix of our linear transformation A can be 


written in the form 
A’ O 
a=(9 ¢): 


where A’ is the matrix (4.4) and C’ is the matrix of the linear transformation ob- 
tained by restricting the transformation A to the subspace L”. Analogously, if 


L=L; @lo@®:--@OLx, 


where all the L; are invariant subspaces with respect to the transformation A, then 
the matrix of the transformation A can be written in the form 


0 AL 0 
Ae “eo ole (4.6) 
0 Ow A 


where A’ is the matrix of the linear transformation obtained by restricting A to the 
invariant subspace L;. Matrices of the form (4.6) are called block-diagonal. 

The simplest case is that of an invariant subspace of dimension 1. This subspace 
has a basis consisting of a single vector e 4 0, and its invariance is expressed by the 
relationship 


A(e) = he (4.7) 
for some number A. 
Definition 4.10 If the relationship (4.7) is satisfied for a vector e 4 0, then e is 


called an eigenvector, and the number A is called an eigenvalue of the transforma- 
tion A. 


Given an eigenvalue A, it is easy to verify that the set of all vectors e € L satis- 
fying the relationship (4.7), including here also the zero vector, forms an invariant 


138 4 Linear Transformations of a Vector Space to Itself 


subspace of L. It is called the eigensubspace for the eigenvalue 4 and is denoted 
by Ly. 


Example 4.11 In Example 4.6, the eigenvectors of the transformation are, first 
of all, all the vectors in the plane L’ (in this case the eigenvalue is 4 = 1), and 
secondly, every vector on the line L” (the eigenvalue is A = —1). In Example 4.7, 
the eigenvectors are all vectors lying on the line L’, and to them correspond the 
eigenvalue 1 = 1. In Example 4.8, every vector in the space is an eigenvector with 
eigenvalue 4 = a. Of course all the vectors that we are speaking about are nonzero 
vectors. 


Example 4.12 Let L be the space consisting of all infinitely differentiable functions, 
and let the transformation -A be differentiation, that is, it maps every function x(f) in 
L to its derivative x’(t). Then the eigenvectors of A are the functions x(t), not iden- 
tically zero, that are solutions of the differential equation x’(t) = Ax(t). One easily 
verifies that such solutions are the functions x(t) = ce*’, where c is an arbitrary 
constant. It follows that to every number A there corresponds a one-dimensional in- 
variant subspace of the transformation 4 consisting of all vectors x(t) = ce’, and 
for c £ 0 these are eigenvectors. 


There is a convenient method for finding eigenvalues of a transformation A and 
the associated subspaces. We must first choose an arbitrary basis e;,...,@, of the 
space L and then search for vectors e that satisfy relation (4.7), in the form of the 
linear combination 


e=xyjey +x2e2 +--+ + Xen. (4.8) 


Let the matrix of the linear transformation A in the basis e1,...,@, be A = (aj;). 
Then the coordinates of the vector A(e) in the same basis can be expressed by the 
equations 


Yr = Ay X1 + 12%. + +++ + AinXn, 
y2 = 421X1 +.22%2 + +++ + 42nXn, 


Yn = Ani X1 + An2X2 + +++ + AnnXn- 


Now we can write down relation (4.7) in the form 


Qj X] + 12X22 + Aly Xp = AX], 
a2) xX) + a22xX2 + +++ + danXn = Ax2, 


AniX1 + An2X2 + +++ + aynXn = AXn, 


4.1 Eigenvectors and Invariant Subspaces 139 
or equivalently, 


(ay, —A)x] + aj2x2 +--+ + dinXn = 0, 


a21X + (a22 — A)x2 + +++ + donXn = 0, 
(4.9) 


Ani X1 + An2X2 + +++ + (Aan — A)Xn = 0. 


For the coordinates x;,x2,...,Xn of the vector (4.8), we obtain a system of n ho- 
mogeneous linear equations. By Corollary 2.13, this system will have a nonzero 
solution if and only if the determinant of its matrix is equal to zero. We may write 
this condition in the form 


|A—AE| =0. 


Using the formula for the expansion of the determinant, we see that the determinant 
|A — tE| is a polynomial in ¢ of degree n. It is called the characteristic polyno- 
mial of the transformation A. The eigenvalues of A are precisely the zeros of this 
polynomial. 

Let us prove that the characteristic polynomial is independent of the basis in 
which we write down the matrix of the transformation. It is only after we have ac- 
complished this that we shall have the right to speak of the characteristic polynomial 
of the transformation itself and not merely of its matrix in a particular basis. 

Indeed, as we have seen (formula (4.3)), in another basis we obtain the matrix 
A’ = C7!AC, where |C| ¥ 0. For this matrix, the characteristic polynomial is 


|A’—tE|=|C7!AC -tE| =|C7'(A-tE)C|. 


Using the formula for the multiplication of determinants and the formula for the 
determinant of an inverse matrix, we obtain 


|c-'(A-1tE)C|=|C7"|-|A-tE|-|C| =|A-tE]. 


If a space has a basis e1,..., @n consisting of eigenvectors, then in this basis, we 
have A(e;) = A;e;. From this, it follows that the matrix of a transformation A in 
this basis has the diagonal form 


iy Da 4 
0 tm 0 
O Q 2% Fy 


This is a special case of (4.6) in which the invariant subspaces L; are one- 
dimensional, that is, L; = (e;). Such linear transformations are called diagonaliz- 
able. 

As the following example shows, not all transformations are diagonalizable. 


140 4 Linear Transformations of a Vector Space to Itself 


Example 4.13 Let A be a linear transformation of the (real or complex) plane that 
in some basis e1, e2 has the matrix 


A=(6 ae b£0. 


The characteristic polynomial |A — tE| = (t — a)* of this transformation has a 
unique zero t = a, of multiplicity 2, to which corresponds the one-dimensional 
eigensubspace (e;). From this it follows that the transformation A is nondiago- 
nalizable. 


This can be proved by another method, using the concept of similar matrices. 
If the transformation A were diagonalizable, then there would exist a nonsingular 
matrix C of order 2 that would satisfy the relation C~! AC = aE, or equivalently, 
the equation AC = aC. With respect to the unknown elements of the matrix C = 
(cij), the previous equality gives us two equations, bc2; = 0 and bc22 = 0, whence 
by virtue of b 4 0, it follows that c21 = c22 = 0, and the matrix C is thus seen to be 
singular. 

We have seen that the number of eigenvalues of a linear transformation is finite, 
and it cannot exceed the number n (the dimension of the space L), since they are the 
zeros of the characteristic polynomial, whose degree is n. 


Theorem 4.14 The dimension of the eigensubspace L, C L associated with the 
eigenvalue X is at most the multiplicity of the value X as a zero of the character- 


istic polynomial. 


Proof Suppose the dimension of the eigensubspace L) is m. Let us choose a basis 


€1,.--,@m Of this subspace and extend it to a basis e;,...,e, of the entire space 
L, in which the matrix of the transformation A has the form (4.5). Since by the 
definition of an eigensubspace, A(e;) = Ae; for all i = 1,...,m, it follows that in 


(4.5), the matrix A’ is equal to XE, where E,, is the identity matrix of order m. 


Then 
_ (A -tEm B’ — (A-HEm B’ 
t= ( (6. @& i) = ( O° Cag) 


where E,— is the identity matrix of order n — m. Therefore, 
|A—t£|=(A—1)"|C! —tEn—n|. 


On the other hand, if L=L, @L”, then L, ML’ = (0), which means that the re- 
striction of the transformation A to L’ has no eigenvectors with eigenvalue A. This 
means that |C’ — AE,—m| 40, that is, the number A is not a zero of the polynomial 
|C’ — tEn—m|, which is what we had to show. 


In the previous chapter we were introduced to the operations of addition and 
multiplication (composition) of linear transformations, which are clearly defined 


4.1 Eigenvectors and Invariant Subspaces 141 


for the special case of a transformation of a space L into itself. Therefore, for any 
integer n > 0 we may define the nth power of a linear transformation. By definition, 
A” for n > 0 is the result of multiplying A by itself n times, and for n = 0, A° is the 
identity transformation €. This enables us to introduce the concept of a polynomial 
in a linear transformation, which will play an important role in what follows. 

Let A be a linear transformation of the vector space L (real, complex, or over an 
arbitrary field IKK) and define 


f(x) =09 tayx +--+ ayx*, 


a polynomial with scalar coefficients (respectively real, complex, or from the 
field KK). 


Definition 4.15 A polynomial f in the linear transformation A is a linear mapping 
f(A) =anE tap At+---+azAk, (4.10) 
where & is the identity linear transformation. 


We observe that this definition does not make use of coordinates, that is, the 
choice of a specific basis in the space L. If such a basis e;,..., @, is chosen, then to 
the linear transformation A there corresponds a unique square matrix A. In Sect. 2.9 
we introduced the notion of a polynomial in a square matrix, which allows us to give 
another definition: f(A) is the linear transformation with matrix 


f(A) =a0E +ajA+---+a;,A* (4.11) 


in the basis e1,..., €n. 

It is not difficult to be convinced of the equivalence of these definitions if we 
recall that the actions of linear transformations are expressed through the actions 
of their matrices (see Sect. 3.3). It is thus necessary to show that in a change of 
basis from e1,...,@,, the matrix f(A) also changes according to formula (4.3) 
with transition matrix C the same as for matrix A. Indeed, let us consider a change of 
coordinates (that is, switching to another basis of the space L) with matrix C. Then in 
the new basis, the matrix of the transformation -A is given by A’ = C~! AC. By the 
associativity of matrix multiplication, we also obtain a relationship A’/” = C~!A"C 
for every integer n > 0. If we substitute A’ for A in formula (4.11), then considering 
what we have said, we obtain 


f(A) =agE +a,A’ feeet aya” 
=Co! (aE +aj;A+---+axA*)\C=C™' f(A)C, 


which proves our assertion. 
It should be clear that the statements that we proved in Sect. 2.9 for polynomials 
in a matrix (p. 69) also apply to polynomials in a linear transformation. 


142 4 Linear Transformations of a Vector Space to Itself 


Lemma 4.16 /f f(x) + g(x) = u(x) and f(x)g(x) = v(x), then for an arbitrary 
linear transformation A, we have 


f(A) + g(A) = u(A), (4.12) 
f(A)g(A) = (A). (4.13) 


Corollary 4.17 Polynomials f(A) and g(A) in the same linear transformation A 
commute: f (A)g(A) = g(A) f(A). 


4.2 Complex and Real Vector Spaces 


We shall now investigate in greater detail the concepts introduced in the previous 
section applied to transformations of complex and real vector spaces (that is, we 
shall assume that the field K is respectively C or R). Our fundamental result applies 
specifically to complex spaces. 


Theorem 4.18 Every linear transformation of a complex vector space has an eigen- 
vector. 


This follows immediately from the fact that the characteristic polynomial of a 
linear transformation, and in general an arbitrary polynomial of positive degree, has 
a complex root. Nevertheless, as Example 4.13 of the previous section shows, even 
in a complex space, not every linear transformation is diagonalizable. 

Let us consider the question of diagonalizability in greater detail, always assum- 
ing that we are working with complex spaces. We shall prove the diagonalizability 
of a commonly occurring type of transformation. To this end, we require the follow- 
ing lemma. 


Lemma 4.19 Eigenvectors associated with distinct eigenvalues are linearly inde- 
pendent. 


Proof Suppose the eigenvectors e1,..., @ are associated with distinct eigenvalues 
Atyeees Ams 


A(e;)=Aje;, i= 1,...,m. 


We shall prove the lemma by induction on the number m of vectors. For the case 
m = 1, the result follows from the definition of an eigenvector, namely that e; 4 0. 
Let us assume that there exists a linear dependence 


aye; +a2@2 +--+ +Amem = 9. (4.14) 
Applying the transformation A to both sides of the equation, we obtain 


Ayayey +Azaz2e2 +++» +AmaAmem = 9. (4.15) 


4.2 Complex and Real Vector Spaces 143 
Subtracting (4.14) multiplied by 4, from (4.15), we obtain 
1 (Ay —Am)e1 + 2(A2 — Am)e2 + +++ + Am—1Am—1 — Am)em—1 = 9. 


By our induction hypothesis, we may consider that the lemma has been proved 
for m — 1 vectors e1,...,@m—1. Thus we obtain that aj(A; — Ay) = 0, ..., 
Am—1(Am—1 — Am) = 0, and since by the condition in the lemma, A, ~ Am, ..., 
Am—1 #Am, it follows that a] =--- = Q@ ,—), = 0. Substituting this into (4.14), we 
arrive at the relationship a@e€», = 0, that is (by the definition of an eigenvector), 
Qm = 0. Therefore, in (4.14), all the a; are equal to zero, which demonstrates the 
linear independence of e1,..., @m. 


By Lemma 4.19, we have the following result. 


Theorem 4.20 A linear transformation on a complex vector space is diagonalizable 
if its characteristic polynomial has no multiple roots. 


As is well known, in this case, the characteristic polynomial has n distinct roots 
(we recall once again that we are speaking about polynomials over the field of com- 
plex numbers). 


Proof of Theorem 4.20 Let i1,..., An be the distinct roots of the characteristic poly- 
nomial of the transformation A and let e1,...,e@, be the corresponding eigenvec- 
tors. It suffices to show that these vectors form a basis of the entire space. Since 
their number is equal to the dimension of the space, this is equivalent to showing 
their linear independence, which follows from Lemma 4.19. 


If A is the matrix of the transformation A in some basis, then the condition of 
Theorem 4.20 is satisfied if and only if the so-called discriminant of the character- 
istic polynomial is nonzero.! For example, if the order of a matrix A is 2, and 


a b 
a=(¢ Ay 


|= =n be=? - (a+ dr tad — be. 


then 


a-t b 


|A-tE|= Pa 


The condition that this quadratic trinomial have two distinct roots is that (a +d y= 
4(ad — bc) £0. This can be rewritten in the form 


(a —d)* + 4bc £0. (4.16) 


'For the general notion of the discriminant of a polynomial, see, for instance, Polynomials, by 
Victor V. Prasolov, Springer 2004. 


144 4 Linear Transformations of a Vector Space to Itself 


Similarly, for complex vector spaces of arbitrary dimension, linear transforma- 
tions not satisfying the conditions of Theorem 4.20 have a matrix that regardless 
of the basis, has elements that satisfy a special algebraic relationship. In this sense, 
only exceptional transformations do not meet the conditions of Theorem 4.20. 

Analogous considerations give necessary and sufficient conditions for a linear 
transformation to be diagonalizable. 


Theorem 4.21 A linear transformation of a complex vector space is diagonaliz- 
able if and only if for each of its eigenvalues i, the dimension of the corresponding 
eigenspace L) is equal to the multiplicity of as a root of the characteristic polyno- 
mial. 


In other words, the bound on the dimension of the subspace L, obtained in The- 
orem 4.14 is attained. 


Proof of Theorem 4.21 Let the transformation A be diagonalizable, that is, in some 
basis e1,..., @, it has the matrix 


ty Oe 1B 
O Ar es 0 
A=]... ' 
0 0 “<2 d, 
It is possible to arrange the eigenvalues A1,...,A, so that those that are equal are 


next to each other, so that altogether, they have the form 


Mig eee Ady ADy a og AD penenrsens Ak ones Ak 
ee ae 
m, times my times mx times 
where all the numbers Aj,...,A,% are distinct. In other words, we can write the 


matrix A in the block-diagonal form 


Eee 0 wee O 
© WE. =e « 

A= ; . : (4.17) 
0 O° 422: By 


where Ey, is the identity matrix of order m;. Then 
|A—tE| = (1 — 1)" (a2 — 1) + Ag — 0), 


that is, the number A; is a root of multiplicity m; of the characteristic equation. 
On the other hand, the equality A(x) = A;x for vectors x = aye; +--- + anen 
gives the relationship A,a; = A;a; for all j =1,...,n and s =1,...,k, that is, 
either a; = 0 or A, = A;. In other words, the vector x is a linear combination only 


4.2 Complex and Real Vector Spaces 145 


of those eigenvectors e; that correspond to the eigenvalue 4;. This means that the 
subspace L,, consists of all linear combinations of such vectors, and consequently, 
dimL,, = mj. 

Conversely, for distinct eigenvalues 41,..., Ax, let the dimension of the eigen- 
subspace Lj, be equal to the multiplicity m; of the number A; as a root of the char- 
acteristic polynomial. Then from known properties of polynomials, it follows that 
m, +---+m, =n, which means that 


dimL,, +--+» +dimL,, =dimL. (4.18) 


We shall show that the sum L,, +--- +L, is a direct sum of its eigensubspaces 
L,,. To do so, it suffices to show that for all vectors x; €L),, ..., xx € Ly,, the 
equality x; +---+ x, =0 is possible only in the case that x} =--- =x, = 0. But 
since X1,..., Xx are eigenvectors of the transformation A corresponding to distinct 
eigenvalues A1,...,A,, the required assertion follows by Lemma 4.19. Therefore, 
by equality (4.18), we have the decomposition 


L=l,, @---@li,. 


Having chosen from each eigensubspace Lj,, i = 1,..., k, a basis (consisting of m; 
vectors), and having ordered them in such a way that the vectors entering into a 
particular subspace Lj, are adjacent, we obtain a basis of the space L in which the 
matrix A of the transformation “A has the form (4.17). This means that the transfor- 
mation A is diagonalizable. 


The case of real vector spaces is more frequently encountered in applications. 
Their study proceeds in almost the same way as with complex vector spaces, except 
that the results are somewhat more complicated. We shall introduce here a proof of 
the real analogue of Theorem 4.18. 


Theorem 4.22 Every linear transformation of a real vector space of dimension 
n > 2 has either a one-dimensional or two-dimensional invariant subspace. 


Proof Let A be a linear transformation of a real vector space L of dimension 
n > 2, and let x € L be some nonnull vector. Since the collection x, A(x), A(x), 
..., A” (x) consists of n + 1 > dimL vectors, then by the definition of the dimension 
of a vector space, these vectors must be linearly dependent. This means that there 
exist real numbers ao, a1, ..., @,, not all zero, such that 


agx +a, F F(x) +02A7(x) +--+ +a,A7(x) =0. (4.19) 


Consider the polynomial P(t) = ag +a ,f+---+a,t” and substitute for the variable 
t, the transformation A, as was done in Sect. 4.1 (formula (4.10)). Then the equality 
(4.19) can be written in the form 


P(A)(x) =0. (4.20) 


146 4 Linear Transformations of a Vector Space to Itself 


A polynomial P(t) satisfying equality (4.20) is called an annihilator polynomial of 
the vector x (where it is implied that it is relative to the given transformation A). 

Let us assume that the annihilator polynomial P(t) of some vector x 4 0 is the 
product of two polynomials of lower degree: P(t) = Q)(t) Q2(t). Then by definition 
(4.20) and formula (4.13) from the previous section, we have Q;(“)Q2()(x) = 0. 
Then either Q2(A)(x) = 0, and hence the vector x is annihilated by an anni- 
hilator polynomial Q(t) of lower degree, or else Q2(A)(x) 4 0. If we assume 
y = Q2(A)(x), we obtain the equality Q;()(y) = 0, which means that the non- 
null vector y is annihilated by the annihilator polynomial Q(t) of lower degree. As 
is well known, an arbitrary polynomial with real coefficients is a product of polyno- 
mials of first and second degree. Applying to P(t) as many times as necessary the 
process described above, we finally arrive at a polynomial Q(t) of first or second 
degree and a nonnull vector z such that Q(A)(z) = 0. This is the real analogue of 
Theorem 4.18. 

Factoring out the coefficient of the high-order term of Q(t), we may assume that 
this coefficient is equal to 1. If the degree of Q(t) is equal to 1, then Q(t) = t —A for 
some A, and the equality Q()(z) = 0 yields (A —2&)(z) = 0. This means that A is 
an eigenvalue of z, which is an eigenvector of the transformation A, and therefore, 
(z) is a one-dimensional invariant subspace of the transformation A. 

If the degree of Q(t) is equal to 2, then Q(t) = 17 + pt +q and (A? 4+ pA+ 
qg&)(z) = 0. In this case, the subspace L’ = (z, “(z)) is two-dimensional and is in- 
variant with respect to A. Indeed, the vectors z and A(z) are linearly independent, 
since otherwise, we would have the case of an eigenvector z considered above. This 
means that dimL’ = 2. We shall show that L’ is an invariant subspace of the trans- 
formation A. Let x =az+ BA(z). To show that A(x) € L’, it suffices to verify that 
vectors A(z) and (A(z)) belong to L’. This holds for the former by the definition 
of L’. It holds for the latter by the fact that A(A(z)) = A(z) and by the condition 
of the theorem, A?(z) + pA(z) + qz =O, that is, A?(z) = —qz — pA(z). 


Let us discuss the concept of the annihilator polynomial that we encountered in 
the proof of Theorem 4.22. An annihilator polynomial of a vector x ~ 0 having 
minimal degree is called a minimal polynomial of the vector x. 


Theorem 4.23 Every annihilator polynomial is divisible by a minimal polynomial. 


Proof Let P(t) be an annihilator polynomial of the vector x 4 0, and Q(t) a mini- 
mal polynomial. Let us suppose that P is not divisible by Q. We divide P by Q with 
remainder. This gives the equality P = UQ-+ R, where U and R are polynomials 
in f, and moreover, R is not identically zero, and the degree of R is less than that 
of Q. If we substitute into this equality the transformation A for the variable r, then 
by formulas (4.12) and (4.13), we obtain that 


P(A)(x) = U(A)QO(A)(x) + R(A)(X), (4.21) 


4.2 Complex and Real Vector Spaces 147 


and since P and Q are annihilator polynomials of the vector x, it follows that 
R(A)(x) = 0. Since the degree of R is less than that of Q, this contradicts the 
minimality of the polynomial Q. 


Corollary 4.24 The minimal polynomial of a vector x 4 0 is uniquely defined up to 
a constant factor. 


Let us note that for the annihilator polynomial, Theorem 4.23 and its converse 
hold: any multiple of any annihilator polynomial is also an annihilator polynomial 
(of course, of the same vector x). This follows from the fact that in this case, in 
equality (4.21), we have R = 0. From this follows the assertion that there exists a 
single polynomial that is an annihilator for all vectors of the space L. Indeed, let 
€1,..-,@, be some basis of the space L, and let P;,..., P,, be annihilator polyno- 
mials for these vectors. Let us denote by Q the least common multiple of these 
polynomials. Then from what we have said above, it follows that Q is an annihi- 
lator polynomial for each of the vectors e1,...,@,; that is, Q(A)(e;) = 0 for all 
i =1,...,n. We shall prove that Q is an annihilator polynomial for every vec- 
tor x € L. By definition, x is a linear combination of vectors of a basis, that is, 
x =ayje; +aze0o+---+ayen. Then 


O(A)(X) = QCA) (a e1 +++ + nen) 
= a O(A)(E1) + +++ + On O(A)(En) 
=0. 


Definition 4.25 A polynomial the annihilates every vector of a space L is called an 
annihilator polynomial of this space (keeping in mind that we mean for the given 
linear transformation “A :L— L). 


In conclusion, let us compare the arguments used in the proofs of Theorems 4.18 
and 4.22. In the first case, we relied on the existence of a root (that is, a factor of 
degree 1) of the characteristic polynomial, while in the latter case, we required the 
existence of a simplest factor (of degree 1 or 2) for the annihilator polynomial. The 
connection between these polynomials relies on a result that is important in and of 
itself. It is called the Cayley—Hamilton theorem. 


Theorem 4.26 The characteristic polynomial is an annihilator polynomial for its 
associated vector space. 


The proof of this theorem is based on arguments analogous to those used in the 
proof of Lemma 4.19, but relating to a much more general situation. We shall now 
consider polynomials in the variable t whose coefficients are not numbers, but linear 
transformations of the vector space L into itself or (which is the same thing if some 
fixed basis has been chosen in L) square matrices P;: 


148 4 Linear Transformations of a Vector Space to Itself 
P(t)= Pot Pit+---+ Pyt*. 


One can work with these as with ordinary polynomials if one assumes that the vari- 
able tf commutes with the coefficients. It is also possible to substitute for ¢ the matrix 
A of a linear transformation. We shall denote the result of this substitution by P(A), 
that is, 


P(A) = Po + PA+-- + BAL. 


It is important here that ¢ and A are written to the right of the coefficients P;. Further, 
we shall consider the situation in which P; and A are square matrices of one and the 
same order. In view of what we have said above, all assertions will be true as well 
for the case that in the last formula, instead of the matrices P; and A we have the 
linear transformations P; and A of some vector space L into itself: 


P(A)= Pot Pi At tPA. 


However, in this case, the analogue of formula (4.13) from Sect. 4.1 does not 
hold, that is, if the polynomial R(t) is equal to P(t)Q(t) and A is the matrix of 
an arbitrary linear transformation of the vector space L. Then generally speaking, 
R(A) 4 P(A) Q(A). For example, if we have polynomials P = Pit and Q = Qo, 
then Pt Qo = Pi Qot, but it is not true that P} AQo = P; QoA for an arbitrary matrix 
A, since matrices A and Qo do not necessarily commute. However, there is one 
important special case in which formula (4.13) holds. 


Lemma 4.27 Let 
P(t)=Po+Pitt---+Pyt®, Q(t)=Oo+ Qitt+---+Qrt', 


and suppose that the polynomial R(t) equals P(t)Q(t). Then R(A) = P(A) Q(A) 
if the matrix A commutes with every coefficient of the polynomial Q(t), that is, 
AQ; = QiA foralli=1,...,1. 


Proof It is not difficult to see that the polynomial R(t) = P(t) Q(t) can be rep- 
resented in the form R(t) = Ro + Rit +---+ Ryait**! with coefficients R; = 
ae. PiQs—i, where P; = 0 if i > k, and Q; =0 if i >/. Similarly, the polyno- 
mial R(A) = P(A) Q(A) can be expressed in the form 


k+l s 
R(A)= > (> PiA 2-8") 


s=0 \i=0 


with the same conditions: P; = 0 if i > k, and Q; = 0 if i > 1. By the condition of 
the lemma, AQ; = Q; A, whence by induction, we easily obtain that A‘'Q j=Qj A! 
for every choice of i and j. Thus our expression takes the form 


k+l Ss 
R(A) = ¥(» P; 0.18’ = P(A)Q(A). 


s=0 \i=0 


4.3 Complexification 149 


Of course, the analogous assertion holds for all polynomials for which the vari- 
able ¢ stands to the left of the coefficients (then the matrix A must commute with 
every coefficient of the polynomial P, and not Q). 

Using Lemma 4.27, we can prove the Cayley—Hamilton theorem. 


Proof of Theorem 4.26 Let us consider the matrix t E — A and denote its determinant 
by g(t) = |tE — A|. The coefficients of the polynomial g(t) are numbers, and as is 
easily seen, it is equal to the characteristic polynomial matrix A multiplied by (—1)” 
(in order to make the coefficient of t” equal to 1). Let us denote by B(t) the adjugate 
matrix to rE — A (see the definition on p. 73). It is clear that B(t) will contain as 
its elements certain polynomials in ¢t of degree at most n — 1, and consequently, we 
may write it in the form B(t) = By + Byt+---+ B,_\t"— | where the B; are certain 
matrices. Formula (2.70) for the adjugate matrix yields 


B()\(tE — A) =9(t)E. (4.22) 


Let us substitute into formula (4.22) in place of the variable t the matrix A of the 
linear transformation 4 with respect to some basis of the vector space L. Since the 
matrix A commutes with the identity matrix F and with itself, then by Lemma 4.27, 
we obtain the matrix equality B(A)(AE — A) = g(A)E, the left-hand side of which 
is equal to the null matrix. It is clear that in an arbitrary basis, the null matrix is the 
matrix of the null transformation 0 : L —> L, and consequently, g(.A) = @. And this 
is the assertion of Theorem 4.26. 


In particular, it is now clear that by the proof of Theorem 4.22, we may take as 
the annihilator polynomial the characteristic polynomial of the transformation A. 


4.3 Complexification 


In view of the fact that real vector spaces are encountered especially frequently in 
applications, we present here another method of determining the properties of linear 
transformations of such spaces, proceeding from already proved properties of linear 
transformations of complex spaces. 

Let L be a finite-dimensional real vector space. In order to apply our previously 
worked-out arguments, it will be necessary to embed it in some complex space L“. 
For this, we shall use the fact that, as we saw in Sect. 3.5, L is isomorphic to the 
space of rows of length n (where n = dimL), which we denote by R”. 

In view of the usual set inclusion R Cc C, we may consider R” a subset of C”. In 
this case, it is not, of course, a subspace of C” as a vector space over the field C. 
For example, multiplication by the complex scalar i does not take R” into itself. On 
the contrary, as is easily seen, we have the decomposition 


C” = R” @iR” 


150 4 Linear Transformations of a Vector Space to Itself 


(let us recall that in C”, multiplication by 7 is defined for all vectors, and in particular 
for vectors in the subset IR”). We shall now denote R” by L, while C” will be denoted 
by L©. The previous relationship is now written thus: 


L° =L@iL. (4.23) 


An arbitrary linear transformation A on a vector space L (as a space over the field 
IR) can then be extended to all of i (as a space over the field C). Namely, as follows 
from the decomposition (4.23), every vector x € L© can be uniquely represented in 
the form x =u -+iv, where u,v € L, and we set 


A(x) = A(u) +i AD). (4.24) 


We omit the obvious verification that the mapping A defined by the relationship 
(4.24) is a linear transformation of the space Le (over the field C). Moreover, it is 
not difficult to prove that © is the only linear transformation of the space L© whose 
restriction to L coincides with A, that is, for which the equality AL (x) = A(x) is 
satisfied for all x in L. 

The construction presented here may seem somewhat inelegant, since it uses 
an isomorphism of the spaces L and R”, for whose construction it is necessary to 
choose some basis of L. Although in the majority of applications such a basis exists, 
we shall give a construction that does not depend on the choice of basis. For this, 
we recall that the space L can be reconstructed from its dual space L* via the iso- 
morphism L ~ L**, which we constructed in Sect. 3.7. In other words, L~ £(L*, R), 
where as before, £(L, M) denotes the space of linear mappings L — M (here either 
all spaces are considered complex or else they are all considered real). 

We now consider C as a two-dimensional vector space over the field R and set 


Sak), (4.25) 


where in £(L*,C), both spaces L* and C are considered real. Thus the relation- 
ship (4.25) carries L© into a vector space over the field R. But we can convert 
it into a space over the field C after defining multiplication of vectors in L© by 
complex scalars. Namely, if g € £(L*, C) and z € C, then we set zy = w, where 
w € L(L*, C) is defined by the condition 


w(f)=z-@(f) forall feL*. 


It is easily verified that L© thus defined is a vector space over the field C, and passage 
from L to L© will be the same as described above, for an arbitrary choice of basis L 
(that is, choice of the isomorphism L ~ R”). 

If A is a linear transformation of the space L, then we shall define a corresponding 
linear transformation A of the space L©, after assigning to each vector w € LC the 
value AC(W) eL® using the relation 


(ATW))(f) =W(A*(f)) forall f €L*, 


4.3 Complexification 151 


where A* : L* > L* is the dual transformation to “A (see p. 125). It is clear that 
A is indeed a linear transformation of the space L©, and its restriction to L coin- 
cides with the transformation A, that is, for every w €L, AC(W)(f) = A(w)(f) is 
satisfied for all f € L*. 


Definition 4.28 The complex vector space L© is called the complexification of the 
real vector space L, while the transformation A© : L© — L® is the complexification 
of the transformation A:L— L. 


Remark 4.29 The construction presented above is applicable as well to a more gen- 
eral situation: using it, it is possible to assign to any vector space L over an arbitrary 
field K the space L*’ over the bigger field K’ > K, and to the linear transformation 
A of the field L, the linear transformation A of the field LE’. 


In the space L© that we constructed, it will be useful to introduce the operation of 
complex conjugation, which assigns to a vector x € L© the vector ¥ € L, or inter- 
preting L© as C” (with which we began this section), taking the complex conjugate 
for each number in the row x, or (equivalently) using (4.23), setting ¥ = u — iv for 
x =u-+iv. It is clear that 


x+y=xX+Y, (ax) =ax 


hold for all vectors x, y € L© and arbitrary complex scalar a. 

The transformation A© obtained according to the rule (4.24) from a certain trans- 
formation A of a real vector space L will be called real. For a real transformation 
AC, we have the relationship 


AC(x) = AC (x), (4.26) 


which follows from the definition (4.24) of a transformation A. Indeed, if we have 
x=u-+iv, then 


A“ (x) = A(u) +iA(v), AC(x) = A(u) —iA(). 


On the other hand, ¥ = u — iv, from which follows AC (X) = A(u) —iA(v) and 
therefore (4.26). 

Consider the linear transformation A of the real vector space L. To it there corre- 
sponds, as shown above, the linear transformation A of the complex vector space 
L©. By Theorem 4.18, the transformation A© has an eigenvector x € L© for which, 
therefore, one has the equality 


A“ (x) =Ax, (4.27) 


where A is a root of the characteristic polynomial of the transformation A and, 
generally speaking, is a certain complex number. We must distinguish two cases: A 
real and A complex. 


152 4 Linear Transformations of a Vector Space to Itself 


Case 1: 2 is a real number. In this case, the characteristic polynomial of the trans- 
formation A has a real root, and therefore A has an eigenvector in the field L; that 
is, L has a one-dimensional invariant subspace. 


Case 2: 4 is a complex number. Let 4 = a + ib, where a and b are real numbers, 
b #0. The eigenvector x can also be written in the form x = uw + iv, where the 
vectors uw, v are in L. By assumption, AC (x) = A(u) +iA(v), and then relationship 
(4.27), in view of the decomposition (4.23), gives 


A(v) =av+bu, A(u) = —bv + au. (4.28) 


This means that the subspace L’ = (v, u) of the space L is invariant with respect to 
the transformation A. The dimension of the subspace L’ is equal to 2, and vectors 
v,u form a basis of it. Indeed, it suffices to verify their linear independence. The lin- 
ear dependence of v and u would imply that v = u (or else that u = &v) for some 
real €. But by v = €u, the second equality of (4.28) would yield the relationship 
A(u) = (a — bé)u, and that would imply that u is a real eigenvector of the transfor- 
mation A, with the real eigenvalue a — bé; that is, we are dealing with case 1. The 
case u = &v is similar. 


Uniting cases | and 2, we obtain another proof of Theorem 4.22. We observe 
that in fact, we have now proved even more than what is asserted in that theorem. 
Namely, we have shown that in the two-dimensional invariant subspace L’ there 
exists a basis v, uw in which the transformation A gives the formula (4.28), that is, it 


has a matrix of the form 
a —b 
(; ‘ ) , b#0. 


Definition 4.30 A linear transformation A of a real vector space L is said to be 
block-diagonalizable if in some basis, its matrix has the form 


a 0 rr 0) 
; : 
ee ee ee (4.29) 
: ; Oo By 
eg 
0 OB, 
where @,..., @, are real matrices of order | (that is, real numbers), and B,,..., Bs 


are real matrices of order 2 of the form 


= (7 a b; #0. (4.30) 
J 


4.3, Complexification 153 


Block-diagonalizable linear transformations are the real analogue of diagonaliz- 
able transformations of complex vector spaces. The connection between these two 
concepts is established in the following theorem. 


Theorem 4.31 A linear transformation A of a vector space L is block- 
diagonalizable if and only if its complexification AS is a diagonalizable trans- 
formation of the space L©. 


Proof Suppose the linear transformation A :L— L is block-diagonalizable. This 
means that in some basis of the space L, its matrix has the form (4.29), which is 
equivalent to the decomposition 


L=L1@---®L, OM, @---@Ms,, (4.31) 


where L; and M; are subspaces that are invariant with respect to the transforma- 
tion A. In our case, dimL; = 1, so that L; = (e;) and A(e;) = aje;, and dimM; = 2, 
where in some basis of the subspace M/, the restriction of the transformation 4 to 
M; has matrix of the form (4.30). Using formula (4.30), one is easily convinced that 
the restriction A© to the two-dimensional subspace M; has two distinct complex- 
conjugate eigenvalues: A; and x j lf f; and f ‘ are the corresponding eigenvectors, 


then in L© there is a basis €1,--.,€r, S41, f45--->f55 £4, in which the matrix of the 
transformation A© assumes the form 
By Oe ace. dak ace eee 3 OO 
0 os 0 
om 0 
ny 
Oa 7 (4.32) 
Aq 
8s. me Se Gn 8 
O OO vee vee eee ee OO Hy 


This means that the transformation © is diagonalizable. 
Now suppose, conversely, that © is diagonalizable, that is, in some basis of the 
space LC, the transformation © has the diagonal matrix 


Ay O «+ O 
O° Ae oe 0 
Ae « ; (4.33) 
0 O - Ap 
Among the numbers 41,..., 4, may be found some that are real and some that are 


complex. All the numbers A; are roots of the characteristic polynomial of the trans- 


154 4 Linear Transformations of a Vector Space to Itself 


formation AC. But clearly (by the definition of L©), any basis of the real vector 
space L is a basis of the complex space L©, and in such a basis, the matrices of the 
transformations A and A© coincide. That is, the matrix of the transformation AC 
is real in some basis. This means that its characteristic polynomial has real coeffi- 
cients. It then follows from well-known properties of real polynomials that if among 
the numbers A), ..., An some are complex, then they come in conjugate pairs A ; and 
das and moreover, A; and dj occur the same number of times. We may assume that 
in the matrix of (4.33), the first r numbers are real: A; = a; € R (i <r), while the re- 
mainder are complex, and moreover, A ; and hj (j > r) are adjacent to each other. In 
this case, the matrix of the transformation assumes the form (4.32). Along with each 
eigenvector e of the transformation 4°, the space L© contains a vector @. Moreover, 
if e has the eigenvalue A, then @ has the eigenvalue 1. This follows easily from the 


fact that A is a real transformation and from the relationship (L©), = (3 which 
can be easily verified. Therefore, we may write down the basis in which the trans- 
formation © has the form (4.32) in the form e1,...,e-, fi, Fis rene oe ts where 
all e; are in L. 

Let us set f; =u; +iv;, where u;,v; €L, and let us consider the subspace 
Nj = (uj, v;). It is clear that N; is invariant with respect to A, and by formula 
(4.28), the restriction of 4 to the subspace N; gives a transformation that in the 
basis uj, v; has matrix of the form (4.30). We therefore see that 


L© = (e1) ©---@ (e,) Bi(e1) ® +++ @iler) @Ni @iNi ®--- ONs @INs, 
from which follows the decomposition 
L=(e1)@--:@ (e-) PBN] @--- ONg, 


analogous to (4.31). This shows that the transformation A: L— L is block- 
diagonalizable. 


Similarly, using the notion of complexification, it is possible to prove a real ana- 
logue of Theorems 4.14, 4.18, and 4.21. 


4.4 Orientation of a Real Vector Space 


The real line has two directions: to the Jeft and to the right (from an arbitrarily cho- 
sen point, taken as the origin). Analogously, in real three-dimensional space, there 
are two directions for traveling around a point: clockwise and counterclockwise. We 
shall consider analogous concepts in an arbitrary real vector space (of finite dimen- 
sion). 

Let e1,...,e, and e}; ...,@, be two bases of a real vector space L. Then there 
exists a linear transformation A: L — L such that 


Afej=e,, ba Ligh (4.34) 


4.4 Orientation of a Real Vector Space 155 


It is clear that for the given pair of bases, there exists only one such linear transfor- 
mation A, and moreover, it is not singular: (|.A| 4 0). 


Definition 4.32 Two bases e;,...,e@, and e’,...,e), are said to have the same ori- 
entation if the transformation A satisfying the condition (4.34) is proper (|| > 0; 
recall Definition 4.4), and to be oppositely oriented if A is improper (|| <0). 


Theorem 4.33 The property of having the same orientation induces an equivalence 
relation on the set of all bases of the vector space L. 


Proof The definition of equivalence relation (on an arbitrary set) was given on 
page xii, and to prove the theorem, we have only to verify symmetry and transitivity, 
since reflexivity is completely obvious (for the mapping A, take the identity trans- 
formation &). Since the transformation A is nonsingular, it follows that relationship 
(4.34) can be written in the form A l(e!) =e;,i=1,...,n, from which follows 
the symmetry property of bases having the same orientation: the transformation A 
is replaced by A~!, where here |.A~!| = |.A|~!, and the sign of the determinant 
remains the same. 


Let bases e1,..., e,, and é\: oe e, have the same orientation, and suppose bases 
e}. ...,e@, and ey ,--+,@, also have the same orientation. By definition, this means 


that the transformations A, from (4.34), and 8, defined by 
Ble))=e/, i=l1,...,n, (4.35) 


are proper. Replacing in (4.35) the expressions for the vectors e’ from (4.34), we 
obtain 


BAG) St FH 1 scc,h; 


and since |B.A| = |B - |.A|, the transformation BA is also proper, that is, the bases 
@€1,...,@, and e ae e” have the same orientation, which completes the proof of 
transitivity. 


We shall denote the set of all bases of the space L by €. Theorem 4.33 then 
tells us that the property of having the same orientation decomposes the set € into 
two equivalence classes, that is, we have the decomposition € = €; U €2, where 
€, 1 €2 = ©. To obtain this decomposition in practice, we may proceed as follows: 
Choose in L an arbitrary basis e;,..., @, and denote by €; the collection of all bases 
that have the same orientation as the chosen basis, and let €) denote the collection 
of bases with the opposite orientation. Theorem 4.33 tells us that this decomposi- 
tion of € does not depend on which basis e1,..., @, we choose. We can assert that 
any two bases appearing together in one of the two subsets €; and €, have the 
same orientation, and if they belong to different subsets, then they have opposite 
orientations. 


Definition 4.34 The choice of one of the subsets €; and € is called an orientation 
of the vector space L. Once an orientation has been chosen, the bases lying in the 


156 4 Linear Transformations of a Vector Space to Itself 


chosen subset are said to be positively oriented, while those in the other subset are 
called negatively oriented. 


As can be seen from this definition, the selection of an orientation of a vector 
space depends on an arbitrary choice: it would have been equally possible to have 
called the positively oriented bases negatively oriented, and vice versa. It is no ac- 
cident that in practical applications, the actual choice of orientation is frequently 
based on an appeal such as to the structure of the human body (left-right) or to the 
motion of the Sun in the heavens (clockwise or counterclockwise). 

The crucial part of the theory presented in this section is that there is a connection 
between orientation and certain topological concepts (such as those presented in the 
introduction to this book; see p. xvii). 

To pursue this idea, we must first of all define convergence for sequences of 
elements of the set €. We shall do so by introducing on the set € a metric, that 
is, by converting it into a metric space. This means that we must define a function 
r(x, y) for all x, y € € taking real values and satisfying properties 1-3 introduced 
on p. xvii. We begin by defining a metric r(A, B) on the set 21 of square matrices of 
a given order 1 with real entries. 

For a matrix A = (q;;) in 2, we let the number (A) equal the maximum abso- 
lute value of its entries: 


p(A) = max jajjl. (4.36) 
3 bere 


Lemma 4.35 The function (A) defined by relationship (4.36) exhibits the follow- 
ing properties: 


(a) (A) > 0 for A4 O and (A) =0 for A= O. 
(b) w(A+ B) < (A) + WB) for all A, BE 2. 
(c) L(AB) <nw(A)u(B) for all A, B € A. 


Proof Property (a) obviously follows from the definition (4.36), while property (b) 
follows from an analogous inequality for numbers: |a;j; + b;j| < \aij| + |bi;|. It re- 
mains to prove property (c). Let A = (a;;), B = (bij), and C = AB = (c;;). Then 
Ciji = yet GikDkj> and so 


leis] < Do laiellbej| <)> e(A)w(B) = np (A) (B). 
k=1 k=1 


From this it follows that u(C) < nu(A)u(B). 


We can now convert the set 2{ into a metric space by setting for every pair of 
matrices A and B in 2, 


r(A, B) = (A — B). (4.37) 


Properties 1—3 introduced in the definition of a metric follow from the definitions in 
(4.36) and (4.37) and properties (a) and (b) proved in Lemma 4.35. 


4.4 Orientation of a Real Vector Space 157 


A metric on 2{ enables us to introduce a metric on the set € of bases of a vector 
space L. Let us fix a distinguished basis e1,...,@, and define the number r(x, y) 
for two arbitrary bases x and y in the set € as follows. Suppose the bases x and y 
consist of vectors ¥1,...,X, and y;,..., y,, respectively. Then there exist linear 
transformations A and & of the space L such that 


A(e;) = Xj, Bev=y;, i=1,...,n. (4.38) 


The transformations A and & are nonsingular, and by condition (4.38), they are 
uniquely determined. Let us denote by A and B the matrices of the transformations 
A and 8 in the basis e1,..., €,, and set 


r(x,y)=r(A, B), (4.39) 


where r(A, B) is as defined above by relationship (4.37). Properties 1—3 in the defi- 
nition of a metric hold for r(x, y) from analogous properties of the metric r(A, B). 

However, here a difficulty arises: The definition of the metric r(x, y) by rela- 
tionship (4.39) depends on the choice of some basis e1,...,@, of the space L. Let 
us choose another basis e\. ioe 5 e;, and let us see how the metric r’(x, y) that re- 
sults differs from r(x, y). To this end, we use the familiar fact that for two bases 
€1,...,@, and e}; ...,@), there exists a unique linear (and in addition, nonsingular) 
transformation @ : L— L taking the first basis into the second: 


e}=C(e;), i=l,...,n. (4.40) 


Formulas (4.38) and (4.40) show that for linear transformations A = AC~! and 
B = BC™!, one has the equality 

A(e;) =xi, Ble) ay. Polat (4.41) 
Let us denote by A’ and B’ the matrices of the transformations A and 8 in the basis 
ei; ...,@,, and by ‘A and B, the matrices of the transformations A and & in this 
basis. Let C be the matrix of the transformation C, that is, by (4.40), the transition 
matrix from the basis e},...,@), to the basis e;,...,@,. Then matrices A’, A and 
B’, B are related by A = A’C~! and B = B’C™!. Furthermore, we observe that A 
and A’ are matrices of the same transformation A in two different bases (e€1,..., €n 
and e}. ...,@,), and similarly, B and B’ are matrices of the single transformation B. 
Therefore, by the formula for changing coordinates, we have A’ = C~!AC and 
B’=cC~!BC , and so as a result, we obtain the relationship 


A=A'C '=c"!A, B=B'C'=cC7!B. (4.42) 


Returning to the definition (4.39) of a metric on 21, we see that r’(x, y) = r(A, B). 
Substituting in the last relationship the expression (4.42) for matrices A and B, and 
taking into account definition (4.37) and property (c) from Lemma 4.35, we obtain 


158 4 Linear Transformations of a Vector Space to Itself 
r(x, y) =r(A, B)=r(C7'A, C7'B) 


= u(C7'(A— B)) snp (C7) (A - B) =ar(x,y), 


where the number a = nu(C~!) does not depend on the bases x and y, but only 
on €1,...,é@, and e\ ...,@),. Since the last two bases play a symmetric role in our 
construction, we may obtain analogously a second equality r(x, y) < Br’(x, y) with 
a certain positive constant 6. The relationship 


r'(x, y) <ar(x,y), r(x,y)< br'(x, y),a, B>O, (4.43) 


shows that although the metrics r(x, y) and r’(x, y) defined in terms of different 
bases e),...,@, and ei: Lees e|, are different, nevertheless, on the set 2(, the notion 
of convergence is the same for both bases. To put this more formally, having chosen 
in € two different bases and having with the help of these bases defined metrics 
r(x, y) and r’(x, y) on &, we have thereby defined two different metric spaces €’ 
and €” with one and the same underlying set € but with different metrics r and r’ 
defined on it. Here the identity mapping of the space € onto itself is not an isometry 
of € and €&”, but by relationship (4.43), it is a homeomorphism. We may therefore 
speak about continuous mappings, paths in €, and its connected components without 
specifying precisely which metric we are using. 

Let us move on to the question whether two bases of the set € can be continuously 
deformed into each other (see the general definition on p. xx). This question reduces 
to whether there is a continuous deformation between the nonsingular matrices A 
and B corresponding to these bases under the selection of some auxiliary basis 
€1,..-,n (just as with other topological concepts, continuous deformability does 
not depend on the choice of the auxiliary basis). We wish to emphasize that the 
condition of nonsingularity of the matrices A and B plays here an essential role. 

We shall formulate the notion of continuous deformability for matrices in a cer- 
tain set 2l (which in our case will be the set of nonsingular matrices). 


Definition 4.36 A matrix A is said to be continuously deformable into a matrix B 
if there exists a family of matrices A(t) in 21 whose elements depend continuously 
on a parameter ¢ € [0, 1] such that A(O) = A and A(1) = B. 


It is obvious that this property of matrices being continuously deformable into 
each other defines an equivalence relation on the set 21. By definition, we need to 
verify that the properties of reflexivity, symmetry, and transitivity are satisfied. The 
verification of all these properties is simple and given on p. xx. 

Let us note one additional property of continuous deformability in the case that 
the set 21 has another property: for two arbitrary matrices belonging to 2, their 
product also belongs to 2. It is clear that this property is satisfied if 2l is the set of 
nonsingular matrices (in subsequent chapters, we shall meet other examples of such 
sets). 


4.4 Orientation of a Real Vector Space 159 


Lemma 4.37 If a matrix A is continuously deformable into B, and C € 2 is an 
arbitrary matrix, then AC is continuously deformable into BC, and CA is continu- 
ously deformable into CB. 


Proof By the condition of the theorem, we have a family A(t) of matrices in 2, 
where tf € [0, 1], effecting a continuous deformation of A into B. To prove the first 
assertion, we take the family A(t)C, and for the second, the family CA(t). This 
family produces the deformations that we require. 


Theorem 4.38 Two nonsingular square matrices of the same order with real ele- 
ments are continuously deformable into each other if and only if the signs of their 
determinants are the same. 


Proof Let A and B be the matrices described in the statement of the theorem. The 
necessary condition that the determinants | A| and |B| be of the same sign is obvious. 
Indeed, in view of the formula for the expansion of the determinant (Sect. 2.7) or else 
by its inductive definition (Sect. 2.2), it is clear that the determinant is a polynomial 
in the elements of the matrix, and consequently, |A(t)| is a continuous function of f. 
But a continuous function taking values with opposite signs at the endpoints of an 
interval must take the value zero at some point within the interval, while at the same 
time, the condition |A(t)| 4 0 must be satisfied for all t € [0, 1]. 

Let us prove the sufficiency of the condition, at first for determinants for which 
|A| > 0. We shall show that A is continuously deformable into the identity matrix E. 
By Theorem 2.62, the matrix A can be represented as a product of matrices Uj;(c), 
Sx, and a diagonal matrix. The matrix U;;(c) is continuously deformable into the 
identity: as the family A(t), we may take the matrices Uj;(ct). Since the S, are 
themselves diagonal matrices, we see that (in view of Lemma 4.37) the matrix A 
is continuously deformable into the diagonal matrix D, and from the assumption 
|A| > 0 and the part of the theorem already proved, it follows that | D| > 0. 

Let 


d, 0 0 0 
0 d 0 + 0 
pu-|9 0 4 0 
OO O96 a 


Every element d; can be represented in the form ¢; p;, where ¢; = | or —1, while 
pi > 0. The matrix (p;) of order | for p; > 0 can be continuously deformed into 
(1). For this, it suffices to set A(t) = (a(t)), where a(t) =t+(1—1f) p; fort € [0, 1]. 
Therefore, the matrix D is continuously deformable into the matrix D’, in which all 
d; = &; p; are replaced by e;. As we have seen, from this it follows that |D’| > 0, 
that is, the number of —1’s on the main diagonal is even. Let us combine them in 
pairs. If there is —1 in the ith and jth places, then we recall that the matrix 


{<0 
( *) (4.44) 


160 4 Linear Transformations of a Vector Space to Itself 


defines in the plane the central symmetry transformation with respect to the origin, 
that is, a rotation through the angle zr. If we set 


cosmt —sinzt 
as (ee cos mt ) i i) 


then we obtain the matrix of rotation through the angle zt, which as t changes from 
0 to 1, effects a continuous deformation of the matrix (4.44) into the identity. It is 
clear that we thus obtain a continuous deformation of the matrix D’ into E. 

Denoting continuous deformability by ~, we can write down three relationships: 
A~ D, D~ D’, D’ ~ E, from which follows by transitivity that A ~ E. From 
this follows as well the assertion of Theorem 4.38 for two matrices A and B with 
|A| > 0 and |B| > 0. 

In order to take care of matrices A with |A| < 0, we introduce the function 
€é(A) = +1 if |A| > 0 and e(A) = —1 if |A| < 0. It is clear that e(AB) = e(A)e(B). 
If e(A) = e(B) = —1, then let us set A~! B = C. Then e(C) = 1, and by what was 
proved previously, C ~ E. By Lemma 4.37, it follows that B ~ A, and by symmetry, 
we have A ~ B. 


Taking into account the results of Sect. 3.4 and Lemma 4.37, from Theorem 4.38, 
we obtain the following result. 


Theorem 4.39 Two nonsingular linear transformations of a real vector space are 
continuously deformable into each other if and only if the signs of their determinants 
are the same. 


Theorem 4.40 Two bases of a real vector space are continuously deformable into 
each other if and only if they have the same orientation. 


Recalling the topological notions introduced earlier of path-connectedness and 
path-connected component (p. xx), we see that the results we have obtained can be 
formulated as follows. The set 2l of nonsingular matrices of a given order (or linear 
transformations of the space L into itself) can be represented as the union of two 
path-connected components corresponding to positive and negative determinants. 
Similarly, the set € of all bases of a space L can be represented as the union of two 
path-connected components consisting of positively and negatively oriented bases. 


Chapter 5 
Jordan Normal Form 


5.1 Principal Vectors and Cyclic Subspaces 


In the previous chapter, we studied linear transformations of real and complex vector 
spaces into themselves, and in particular, we found conditions under which a linear 
transformation of a complex vector space is diagonalizable, that is, has a diagonal 
matrix (consisting of eigenvectors of the transformation) in some specially chosen 
basis. We showed there that not all transformations of a complex vector space are 
diagonalizable. 

The goal of this chapter is a more complete study of linear transformations of a 
real or complex vector space to itself, including the investigation of nondiagonal- 
izable transformations. In this chapter as before, we shall denote a vector space by 
L and assume that it is finite-dimensional. Moreover, in Sects. 5.1 to 5.3, we shall 
consider linear transformations of complex vector spaces only. 

As already noted, the diagonalizable linear transformations are the simplest class 
of transformations. However, since this class does not cover all linear transforma- 
tions, we would like to find a construction that generalizes the construction of di- 
agonalizable linear transformations, and indeed so general as to encompass all lin- 
ear transformations. A transformation can be brought into diagonal form if there is 
a basis consisting of the transformation’s eigenvectors. Therefore, let us begin by 
generalizing the notion of eigenvector. 

Let us recall that an eigenvector e 4 0 of a linear transformation A:L— L with 
eigenvalue A satisfies the condition A(e) = Ae, or equivalently, the equality 


(A —A€)(e) = 90. 
A natural generalization of this is contained in the following definition. 


Definition 5.1 A nonnull vector e is said to be a principal vector of a linear trans- 
formation A :L— L with eigenvalue A if for some natural number m, the following 
condition is satisfied: 


(A —A6)"(e) =0. (5.1) 


LR. Shafarevich, A.O. Remizov, Linear Algebra and Geometry, 161 
DOI 10.1007/978-3-642-30994-6_5, © Springer-Verlag Berlin Heidelberg 2013 


162 5 Jordan Normal Form 


The smallest natural number m for which relation (5.1) is satisfied is called the 
grade of the principal vector e. 


Example 5.2 An eigenvector is a principal vector of grade 1. 


Example 5.3 Let L be the vector space of polynomials x(t) of degree at most n — 1, 
and let -A be the linear transformation that maps every function x(t) to its derivative 
x’(t). Then 


A(x(N)=x'(N, AK (x(1)) =x OY). 


Since (t*) =k! £0 and (t*)*+) = 0, it is obvious that the polynomial x(t) = r* 
is a principal vector of the transformation A of grade k + | corresponding to the 
eigenvalue A = 0. 


Definition 5.4 Let e be a principal vector of grade m corresponding to the eigen- 
value 2. The subspace M spanned by the vectors 


e, (AAEM),  ..., (A—AE)™ 1), (5.2) 
is called the cyclic subspace generated by the vector e. 


Example 5.5 If m = 1, then a cyclic subspace is the one-dimensional subspace (e) 
generated by the eigenvector e. 


Example 5.6 In Example 5.3, the cyclic subspace generated by the principal vector 
x(t) =t* consists of all polynomials of degree at most k. 


Theorem 5.7 A cyclic subspace M C L generated by the principal vector e of grade 
m is invariant under the transformation A and has dimension m. 


Proof Since the cyclic subspace M is spanned by m vectors (5.2), its dimension is 
obviously at most m. We shall prove that the vectors (5.2) are linearly independent, 
which will imply that dimM = m. 

Let 


aye + a2(A — AE)(e) +--+» +m (A — AG" !(e) = 0. (5.3) 


Let us apply the linear transformation (A — A&)'"—! to both sides of this equality. 
Since by definition (5.1) of a principal vector, we have (A —A&)"(e) = 0, then a 
fortiori, (A — 2&)*(e) = 0 for every k > m. We therefore obtain that 


ai(A —2€)"—!(e) =0, 


and since (A — A€)"—!(e) £ 0, in view of the fact that e is of grade m, we have the 
equality a; = 0. Relationship (5.3) now takes the following form: 


a2 (A — KE)(e) +++: + Om(A — AB)" —!(e) =0. (5.4) 


5.1 Principal Vectors and Cyclic Subspaces 163 


Applying the linear transformation (4 — A€)’"~* to both parts of equality (5.4), 
we prove in exactly the same way that w2 = 0. Continuing further in this way, we 
obtain that in relationship (5.3), all the coefficients a@1,...,@, are equal to zero. 
Consequently, the vectors (5.2) are linearly independent, and so we have dimM = m. 

We shall now prove the invariance of the cyclic subspace M associated with the 
transformation A. Let us set 


e| =e, €2 = (A— AE )(e), sinks €m = (A —A6)"—1e). (5.5) 


Since all vectors of the subspace M can be expressed as linear combinations of the 
vectors €1,...,@m, it suffices to prove that the vectors A(eé1),..., A(e@m) can be 
expressed as linear combinations of e;,...,@m. But from relationships (5.1) and 
(5.5), it is clear that 


(A — A€)(e1) = e2, (A — A) (e2) = 3, ae (A —1€)(em) = 9, 
that is, 
A(e;) = Ae; + eo, A(e2) = hen + €3, ae A(€m) =A€m, (5.6) 


which establishes the assertion of the theorem. 


Corollary 5.8 The vectors e1,...,€m defined by formula (5.5) form a basis of the 
cyclic subspace M generated by the principal vector e. The matrix of the restriction 
of the linear transformation A to the subspace M in this basis has the form 


Xa 0 0 «--- + O 

l & 30 0 

Ola : 
A=]. . : L |. (5.7) 

x 0 

0 0 i & 


This is an obvious consequence of (5.6). 


Theorem 5.9 Let M be a cyclic subspace generated by the principal vector e of 
grade m with eigenvalue X. Then an arbitrary vector y € M can be written in the 
form 


y= f(A), 


where f is a polynomial of degree at most m — |. If the polynomial f (t) is not divis- 
ible by t — i, then the vector y is also a principal vector of grade m and generates 
the same cyclic subspace M. 


164 5 Jordan Normal Form 


Proof The first assertion of the theorem follows at once from the fact that by the 
definition of a cyclic subspace, every vector y € M has the form 


y=aje+ar(A—AE)(e) + --- Fam (A r6)"-1(e), (5.8) 
that is, y = f(A)(e), where the polynomial f(t) is given by 
fm=a, tmeG= i) teste |, 


Let us prove the second assertion. Let y = f(A)(e). Then (4 —A&)(y) = 0. 
Indeed, from the relationships y = f(A)(e) and (5.1) and taking into account the 
property established earlier that two arbitrary polynomials in one and the same linear 
transformation commute (a consequence of Lemma 4.16 in Sect. 4.1; see p. 142), 
we obtain the equality 


(A — 28)" (y) = (A — 2€)" f(A)(€) = f(A) (A — 16)" (e) = 0. 


Let us assume that the polynomial f(t) is not divisible by t — A. This implies 
that the coefficient aw; is nonzero. We shall show that we then must have (A — 
2€)"—|(y) £0. Applying the linear transformation (4 — A€)’"—! to the vectors on 
both sides of equality (5.8), we obtain 


(A —26)"~"(y) 
= a1 (A-— ne)" (e) + an (A —2E)™(e) +++ 4m (A — n€)2"-2(e) 
= 01 (A —A8)""!(e), 


since we have (A — 24€)*(e) = 0 for every k > m. From this last relationship and 
taking into account the conditions a; 4 0 and (A — r6)"—!(e) 0, it follows that 
(A —2&)"—!(y) 40. Therefore, the vector y is also a principal vector of the linear 
transformation A of grade m. 

Finally, we shall prove that the cyclic subspaces M and M’ generated by principal 
vectors e and y coincide. It is clear that M’ C M, since y € M, and in view of the 
invariance of the cyclic subspace M, the vector (A — 6k (y) for arbitrary k is 
also contained in M. But from Theorem 5.7, it follows that dimM = dimM’ = m, 
and therefore, by Theorem 3.24, the inclusion M’ Cc M implies simply the equality 
MW =M. 


Corollary 5.10 In the notation of Theorem 5.9, for an arbitrary vector y € M and 
scalar tt # i, we have the representation y = (A — &)(Z) for some vector z € M. 
Furthermore, we have the following: either y is a principal vector of grade m that 
generates the cyclic subspace M, or else y = (A — A&)(z) for some vector z € M. 


Proof The matrix of the restriction of the linear transformation A to the subspace M 
in the basis e),..., @m from (5.5) has the form (5.7). From this, it is easily seen that 
for arbitrary yu ~ A, the determinant of the restriction of the linear transformation 


5.2 Jordan Normal Form (Decomposition) 165 


“A — w& to Mis nonzero. From Theorems 3.69 and 3.70, it follows that the restriction 
of A — w& to M is an isomorphism M-> M, and its image is (A — ~&)(M) = M; 
that is, for an arbitrary vector y € M, there exists a vector z € M such that y = 
(A — “&)(Z). 

By Theorem 5.9, a vector y can be represented in the form y = f(A)(e), and 
moreover, if the polynomial f(t) is not divisible by t — A, then y is a principal 
vector of grade m generating the cyclic subspace M. But if f(r) is divisible by t — A, 
that is, f(t) = (¢ — A)g(t) for some polynomial g(f), then setting z = g(.A)(e), we 
obtain the required representation y = (A — A&)(z). 


5.2 Jordan Normal Form (Decomposition) 


For the proof of the major result of this section and indeed of the entire chapter—the 
theorem on the decomposition of a complex vector space as a direct sum of cyclic 
subspaces—we require the following lemma. 


Lemma 5.11 For an arbitrary linear transformation A :L— L ofa complex vector 
space, there exist a scalar i and an (n — 1)-dimensional subspace U' C L invariant 
with respect to the transformation A such that for every vector x € L, we have the 
equality 


A(x) =Ax+y, whereyel. (5.9) 


Proof By Theorem 4.18, every linear transformation of a complex vector space has 
an eigenvector and associated eigenvalue. Let A be an eigenvalue of the transforma- 
tion A. Then the transformation B = A — 16 is singular (it annihilates the eigen- 
vector), and by Theorem 3.72, its image 8(L) is a subspace M C L of dimension 
m<n. 

Let e1,...,@m be a basis of M. We shall extend it arbitrarily to a basis of L by 
means of the vectors €)+1,...,@n. It is clear that the subspace 


/ 
t= (€1, oem, eCm4+1, teagan) 


has dimension n — | and includes M, since e1,..., @m € M. 

Let us now prove equality (5.9). Consider an arbitrary vector x € L. Then we 
have B(x) € B(L) =M, which implies that B(x) € L’, since M C LU’. Recalling that 
A= B+A6E, we obtain that A(x) = B(x) +Ax, and moreover, by our construction, 
the vector y = B(x) is in L’. From this, the invariance of the subspace L’ easily 
follows. Indeed, if x € L’, then in equality (5.9), we have not only y € L’, but also 
Ax €L’, which yields that A(x) € L’ as well. 


The main result of this section (the decomposition theorem) is the following. 


Theorem 5.12 A finite-dimensional complex vector space L can be decomposed 
as a direct sum of cyclic subspaces relative to an arbitrary linear transformation 
A:LOL. 


166 5 Jordan Normal Form 


Proof The proof will be by induction on the dimension n = dim L. It is based on the 
lemma proved above, and we shall use the same notation. Let L’ c L be the same 
(n — 1)-dimensional subspace invariant with respect to the transformation A that 
was discussed in Lemma 5.11. 

We choose any vector e’ ¢L’. If f),..., f,,-1 is any basis of the subspace L’, 
then the vectors f,,..., f,_1,e form a basis of L. Indeed, there are n = dimL 
vectors, and so it suffices to prove their linear independence. Let us suppose that 


arf; +-:-+anif,_1 + Be’ =0. (5.10) 
If 6 £0, then from this equality, it would follow that e’ € L’. Therefore, 8 = 0, and 
then from equality (5.10), by the linear independence of the vectors fy,..., fn—1 
it follows that aj =---=Q,_, = 0. 


We shall rely on the fact that the vector e’ € L can be chosen arbitrarily. Till 
now, it satisfied only the single condition e’ ¢ L’, but it is not difficult to see that 
every vector e” = e’ + x, where x € L’, satisfies the same condition, and this means 
that any such vector could have been chosen in place of e’. Indeed, if e” € L’, then 
considering that x € L’, we would have e’ € L’, contradicting the assumption. 

It is obvious that Theorem 5.12 is true for n = 1. Therefore, by the induction 
hypothesis, we may assume that it holds as well for the subspace L’. Let 


U=L,@---@L, (5.11) 


be the decomposition of L’ as a sum of cyclic subspaces, and moreover, suppose that 
each cyclic subspace L; is generated by its principal vector e; of grade m; associated 
with the eigenvalue A; and has the basis 


ei, (A — 1; €)(e;), eat (A — Aye)" (e;). (5.12) 


By Theorem 5.7, it follows that dimL; =m; andn —-1=m,+---+m,. 
For the vector e’ chosen at the start of the proof, we have, by the lemma, the 
equality 


A(e’) =Ae’+y, where yeL’. 
In view of the decomposition (5.11), this vector y can be written in the form 
y=y,t+---+y,, (5.13) 


where y; € L;. Thanks to Corollary 5.10, we may assert that the vector y; either can 
be written in the form (.A — A&)(z;) for some z; € L;, or is a principal vector of 
grade m; associated with the eigenvalue 2. Changing if necessary the numeration of 
the vectors y;, we may write 


(A —AE)(e’) = (A—AB)(Z) + ys +e +, (5.14) 


where z= Z} +---+ Zs5_1, 2; € Lj, for all i = 1,...,5 — 1, and each of the vectors 
y,; with indices 7 =5,...,r generates the cyclic subspace Lj. 


5.2. Jordan Normal Form (Decomposition) 167 
Here there are two possible cases. 


Case 1. In formula (5.14), we have s — 1 =r, that is, 
(A — r€)(e’) =(A-AE)(z), zEL. 


Choosing the vector e’ arbitrarily, as discussed above, we set e” = e’ — z. Then from 
the previous relationship, we obtain 


(A —2&)(e”) =0. 


By definition, this implies that e” is an eigenvector with eigenvalue 4. Consider the 
one-dimensional subspace L,+; = (e”). It is clear that it is cyclic, and moreover, 


L=U @L41 =Li ®::-@L, @L-41. 
Theorem 5.12 has been proved in this case. 


Case 2. In formula (5.14), we have s — 1 < r. We again set e” = e’ — z. Then from 
(5.14), we obtain that 


(A—AE)(e") =o to +, 6.15) 


where by construction, each y;, j = s,...,7, is a principal vector of grade mj; 
corresponding to the eigenvalue A generating the cyclic subspace L;. 

It is clear that we can always order the vectors y,,..., y, in such a way that 
Ms <-+-<m,. Let us assume that this condition is satisfied. We shall prove that the 
vector e” is a principal vector of grade m, + 1 with associated eigenvalue A, and we 
shall show that we then have the following decomposition: 


L=l1@--@L-10L, (5.16) 


where L/. is a cyclic subspace generated by the vector e”. It is clear that from this 
will follow the assertion of Theorem 5.12. From the equality (5.15), it follows that 


(A — 28)" (el) = (4 — AB) (y,) Fo + (ob — AB)™ (y,). (5.17) 


Since the principal vectors y;, i =s,...,r, have grades m;, and since by our as- 
sumption, all the m; are less than or equal to m,., it follows that (A —A&€)”" (y;) =0 
for all i = s,...,7. From this, taking into account (5.17), it follows that (A — 
26)""+1(e) = 0. In just the same way, we obtain that 


(A — 26) (e”) = (A — AEN"! (yo + (A ABN" "(y,). (5.18) 


The terms on the right-hand side of this sum belong to the subspaces L,,...,L,. If 
we had the equality 


(A — 26)" (e”) =0, 


168 5 Jordan Normal Form 


then it would follow that all the terms on the right-hand side of (5.18) would be 
equal to zero, since the subspaces L,;,...,L, form a direct sum. In particular, we 
would obtain that (A — Aé)""—!(y ,) = 0, and this would contradict that the prin- 
cipal vector y, has grade m,. We therefore conclude that (A — A&)’" (e”) £0, and 
consequently, the principal vector e” has grade m,; + 1. 

It remains to prove relationship (5.16). We observe that the dimensions of the 
spaces L1,...,L,—1 are equal to m1, ...,m,—1, while the dimension of L’. is equal 
tom, + 1. Therefore, from equality (5.12), it follows that the sum of the dimensions 
of the terms on the right-hand side of (5.16) equals the dimension of the left-hand 
side. Therefore, in order to prove the relationship (5.16), it suffices by Corollary 3.40 
(p. 96) to prove that an arbitrary vector in the space L can be represented as the sum 
of vectors from the subspaces Lj,...,L-—1, Li. 

It suffices to prove this last assertion for all vectors in a certain basis of the 
space L. Such a basis is obtained in particular if we combine the vector e” and the 
vectors of certain bases of the subspaces L;, ..., L,. For the vector e”, this assertion 
is obvious, since e” € L’.. In just the same way, the assertion is clear for any vector 
in the basis of one of the subspaces L),..., L-—1. It remains to prove this for vectors 
in some basis of the subspace L,. Such a basis, for example, comprises the vectors 


Yrw (ADEM), wee (AH = BY T!CY,). 

From (5.15), it follows that 
Yr =—(Vg He + Yp-1) + (A — 8) (C"), 
and this means that 
(A — A8)*(y,) = —(A— A€)F(y,) — +++ — (4 — AE)*(y,_1) + (A — 26) TT (€") 
for allk =1,...,m,— 1. And this establishes what we needed to show: since 
y, €Ls, eer y,-1 €Ly-t, eel, 

and since the spaces L,,...,L-—; and L’. are invariant, it follows that 

(A-AB) Ky) Ebs, (A168) (y,1) EL, 

(A — 16) (e") ELI. 


This completes the proof of Theorem 5.12. 


Let us note that in the passage from the subspace L’ to L for a given A, the de- 
composition into cyclic subspaces changes in the following way: either in the de- 
composition there appears one more one-dimensional subspace (case 1), or else the 
dimension of one of the cyclic subspaces increases by | (case 2). 

Let the decomposition into a direct sum of subspaces, whose existence is estab- 
lished by Theorem 5.12, have the form 


L=L,®-:-@L,. 


5.3 Jordan Normal Form (Uniqueness) 169 


In each of the subspaces L;, we will select a basis of the form (5.5) and combine 
them into a single basis e;,...,@, of the space L. In this basis, the matrix A of the 
transformation A has the block-diagonal form 


A, O «=: 0 
QO Ag «= 6 
A=|. 2. |, (5.19) 
O O ::. A, 
where the matrices A; have (by Corollary 5.8) the form 
Ai 0 O 0 
1 4; 0 0 
0 1 i; : 
a ne |. (5.20) 
Ai O 
0 0 1 i; 


The matrix A given by formulas (5.19) and (5.20) is said to be in Jordan normal 
form, while the matrices A; are called Jordan blocks. We therefore have the follow- 
ing result, which is nothing more than a reformulation of Theorem 5.12. 


Theorem 5.13 For every linear transformation of a finite-dimensional complex vec- 
tor space, there exists a basis of that space in which the matrix of the transformation 
is in Jordan normal form. 


Corollary 5.14 Every complex matrix is similar to a matrix in Jordan normal form. 


Proof As we saw in Chap. 3, an arbitrary square matrix A of order n is the matrix of 
some linear transformation A :L-— L in some basis e1,...,@,. By Theorem 5.13, 
in some other basis e},...,@),, the matrix A’ of the transformation 4 is in Jordan 
normal form. As established in Sect. 3.4, the matrices A and A’ are related by the 
relationship (3.43), for some nonsingular matrix C (the transition matrix from the 


first basis to the second). This implies that the matrices A and A’ are similar. 


5.3 Jordan Normal Form (Uniqueness) 


We shall now explore the extent to which the decomposition of the vector space L as 
a direct sum of cyclic subspaces relative to a given linear transformation A:L— L 
is unique. First of all, let us remark that in such a decomposition 


L=Li@---@L,, (5.21) 


170 5 Jordan Normal Form 


the subspaces L; themselves are in no way uniquely determined. The simplest ex- 
ample of this is the identity transformation A= &. For this transformation, every 
nonnull vector is an eigenvector, which means that every one-dimensional subspace 
is a cyclic subspace generated by a principal vector of grade 1. Therefore, any de- 
composition of the space L as a direct sum of one-dimensional subspaces is a de- 
composition as a direct sum of cyclic subspaces, and such a decomposition exists 
for every basis of the space L; that is, there are infinitely many of them. 

However, we shall prove that eigenvalues 4; and the dimensions of the cyclic 
subspaces associated with these numbers coincide for every possible decomposition 
(5.21). As we have seen, the Jordan normal form is determined solely by the eigen- 
values 4; and the dimensions of the associated subspaces (see formulas (5.19) and 
(5.20)). This will give us the uniqueness of the Jordan normal form. 


Theorem 5.15 The Jordan normal form of a linear transformation is completely 
determined by the transformation itself up to the ordering of the Jordan blocks. In 
other words, for the decomposition (5.21) of a vector space L as a direct sum of 
subspaces that are cyclic for some linear transformation A :L— L, the eigenvalues 
A; and dimensions mj; of the associated cyclic subspaces L; depend only on the 
transformation A and are the same for all decompositions (5.21). 


Proof Let 4 be some eigenvalue of the linear transformation A and let (5.21) be one 
possible decomposition. Let us denote by /,, (m = 1, 2, ...) the integer that indicates 
how many m-dimensional cyclic subspaces associated with A are encountered in 
(5.21). 

We shall give a method for calculating /,,, based on A and A only. This will prove 
that this number in fact does not depend on the decomposition (5.21). 

Let us apply to both sides of equality (5.21) the transformation (A — A&)! with 
some i > 1. It is clear that 


(A — 28)! (L) = (A —28)'(L1) © @ (A — AEB) (L,). (5.22) 


We shall now determine the dimensions of the subspaces (.A — 2€)! (Lx). In the 
course of proving the corollary to Theorem 5.9 (Corollary 5.10), we established that 
for arbitrary ps # A, the restriction of the linear transformation A — jw& to M is an 
isomorphism, and its image (A — jw&)(M) is equal to M. Therefore, if L; corresponds 
to the number A; 4 A, then 


(A —~26)'(Le) = Le, Ae FA. (5.23) 
But if A, =A, then choosing in L; the basis e, (A — AE€)(e),..., (A —A)Y™(e), 
where m, = dimL,, that is, it is equal to the grade of the principal vector e, we 


obtain that if i > m,, then the subspace (A — 2.6)! (Lz) consists solely of the null 
vector, while if i < mx, then 


(A — 26)! (Ly) = ((A — 4€)!(e),..., (A — AE)! ()), 


5.3 Jordan Normal Form (Uniqueness) 171 


and moreover, the vectors (A — A€)/(e),..., (A — AG)! (e) are linearly inde- 
pendent. Therefore, in the case Ax = A, we obtain the formula 


| a 
dim(4 —28)'y) = 1” ee (5.24) 


mp—i, ifi <mg. 


Let us denote by n’ the sum of the dimensions of those subspaces L, that corre- 
spond to the numbers A, #2. Then from formulas (5.22)—(5.24), it follows that 


dim(A — A8)!(L) = ign + liga te + (p—Dlp tr’, (5.25) 


where p is the maximal dimension of a cyclic subspace associated with the given 
value A in the decomposition (5.21). Indeed, from the equality (5.22), we obtain that 


dim(A — A&)!(L) = dim(.A — AS)! (L}) +--+ + dim(.A — A8)‘(L,). (5.26) 


It follows from formula (5.23) that the terms dim(.A — A€)! (Lx) with Ax # x in the 
sum give n’. In view of formula (5.24), the terms dim(.A — AE) (Ly) with Ap = 2 
and mx <i are equal to zero. Furthermore, from the same formula (5.24), it follows 
that if my =i +1, then dim(.A — A€)‘ (Lx) = 1, and the number of subspaces Ly 
of dimension mg =i + 1 will be equal to /;+;1 by the definition of the number /,,. 
Therefore, in formula (5.26), the number of terms equal to 1 will be /j1. Similarly, 
the number of subspaces L; of dimension m; = i +2 will be equal J; +2, but with this, 
we already have dim(.A — 2€)! (Lx) = 2, whence on the right-hand side of (5.25), 
there appears the term 2/;,2, and so on. From this follows the equality (5.25). 

Let us recall that in Sect. 3.6, we defined the notion of the rank rk B of an ar- 
bitrary linear transformation B: L — L. Here, rk B coincides with the dimension 
of the image 8(L) and is equal to the rank of the matrix B of this transformation, 


regardless of the basis e;,..., @, in terms of which the matrix of the transformation 
is written. 
Let us now set r; =rk(A —A&)! fori =1,..., p. Let us write the relationships 


(5.25) fori = 1,..., p by taking into account the fact that 
dim(A — A€)'(L) =rk(A—A6)' =r; and 1,=0 fors> p, 
and let us consider also the equality 


n=h+2bn+---+plp+n’, 


172 5 Jordan Normal Form 


which follows from formula (5.21) or from (5.25) for i = 0. As a result, we obtain 


the relationships 


Ly 4 2p +3lg teres + ply +n' =n, 
1 +2i+---+(p-Dlptn' =n, 


! 

Ip +n =Tlp-l; 
! 
n=Pp, 


from which it is possible to express /},...,/) in terms of r1,...,p. 
Indeed, subtracting from each equation the one following it, we obtain 


lp =Tp-1—Tp- 
Repeating this same operation, we obtain 


1) =n—2r, +ro, 


lb =r; —2r2+73, 


Ip-1 = p-2 — 2rp-1 +Trp, 


lp =lp-1—Tp- 


(5.27) 


(5.28) 


From these relationships, it follows that the numbers /; are determined by the num- 


bers r;, which means that they depend only on the transformation A. 


Corollary 5.16 In the decomposition (5.21), the subspace associated with the num- 


ber X occurs if and only if d is an eigenvalue of the transformation A. 


Proof Indeed, if 4 is not an eigenvalue, then the transformation A — A&€ is nonsin- 
gular, and this means that the transformations (.4 — 2&)! are nonsingular as well. 
In other words, r; =n for all i = 1,2,.... From the formulas (5.27), it then fol- 
lows that all J; are equal to 0, that is, in the decomposition (5.21), there are no 
subspaces associated with 4. Conversely, if J; = 0, then from (5.28), we obtain that 
ln =ln—-1 = +++ =| =n. But the equality r; =n means precisely that the transfor- 


mation A — A€ is nonsingular. 


5.4 Real Vector Spaces 173 


Corollary 5.17 Square matrices A and B of order n are similar if and only if their 
eigenvalues coincide and for each eigenvalue i and each i <n, we have 


tk(A — AE)! =rk(B —XE)'. (5.29) 


Proof The necessity of conditions (5.29) is obvious, since if A and B are similar, 
then so are the matrices (A — AE)! and (B — AE)', which means that their ranks are 
the same. 

We now prove sufficiency. Suppose that the conditions (5.29) are satisfied. We 
shall construct transformations A:L— L and 8:L— L having in some basis 
€1,---,@, Of the vector space L the matrices A and B. Let the transformation A 
be brought into Jordan normal form in some basis f;,..., f,,, and the same for B 
in some basis g),..., g,,. In view of equality (5.29) and using formulas (5.25), we 
conclude that these Jordan forms coincide. This means that the matrices A and B 
are similar to some third matrix, and consequently, by transitivity, they are similar 
to each other. 


As an additional application of formulas (5.27), let us determine when a matrix 
can be brought into diagonal form, which is a special case of Jordan form in which 
all the Jordan blocks are of order 1. In other words, all the cyclic subspaces are 
of dimension one. This means that /7 = --- = /, = 0. From the second equality 
in formulas (5.27), it follows that for this, it is necessary and sufficient that the 
condition r; = rz be satisfied (for sufficiency, we must use the fact that J; > 0). We 
have thus proved the following criterion. 


Theorem 5.18 A linear transformation A can be brought into diagonal form if and 
only if for every one of its eigenvalues X, we have 


tk(.A — A€) = rk(A — 46). 


Of course, an analogous criterion holds for matrices. 


5.4 Real Vector Spaces 


Up to this point, we have been considering linear transformations of complex vector 
spaces (this is related to the fact that we have continually relied on the existence 
of an eigenvector for every linear transformation, which may not be true in the real 
case). However, the theory that we have built up gives us a great deal of information 
about the case of transformations of real vector spaces as well, which are especially 
important in applications. 

Let us assume that the real vector space Lo is embedded in the complex vector 
space L, for example its complexification (as was done in Sect. 4.3), while a linear 
transformation Ao of the space Lo determines a real linear transformation A of the 
space L. In this section and the following one, a bar will denote complex conjuga- 
tion. 


174 5 Jordan Normal Form 


Theorem 5.19 Jn the decomposition of the space L into cyclic subspaces with re- 
spect to the real linear transformation A, the number of cyclic m-dimensional 
subspaces associated with the eigenvalue i is equal to the number of cyclic m- 
dimensional subspaces associated with the complex-conjugate eigenvalue 2. 


Proof Since the characteristic polynomial of a real transformation -A has real coef- 
ficients, it follows that for each root 2, the number A is also a root of the character- 
istic polynomial. Let us denote, as we did in the proof of Theorem 5.15, the number 
of cyclic m-dimensional subspaces for the eigenvalue 1 by /,,, and the number of 
cyclic m-dimensional subspaces for the eigenvalue 1 by /’,. In addition, we define 
rj =1tk(A — A€)! and r = rk(A — X6)!. Formulas (5.28) express the numbers 1,,, 
in terms of r,,. Since these formulas hold for every eigenvalue, they also express 
the numbers //, in terms of rj,. Consequently, it suffices to show that r/ = r;, from 
which it will follow that L = 1;, which is the assertion of the theorem. 

To this end, we consider some basis of the space Lo (as a real vector space). It 
will also be a basis of the space L (as a complex vector space). Let A be the matrix of 
the linear transformation in this basis. By definition, it coincides with the matrix 
of the linear transformation “Ag in the same basis, and therefore, it consists of real 
numbers. Hence the matrix A — XE is obtained from A — AE by replacing all the 
elements by their complex conjugates. We shall write this as 


A-AE=A-AE. 
It is easy to see that from this, it follows that for every i > 0, the equation 
(A —XE)' =(A—AE)! 


is satisfied. Thus our assertion is reduced to the following: if B is a matrix with 
complex elements and the matrix B is obtained from B by replacing all its elements 
with their complex conjugates, then rk B = rk B. The proof of this follows at once, 
however, from the definition of the rank of a matrix as the maximal order of the 
nonzero minors: indeed, it is clear that the minors of the matrix B are obtained 
by complex conjugation from the minors of B with the same indices of rows and 
columns, which completes the proof of the theorem. 


Thus according to Theorem 5.19, the Jordan normal form (5.19) of a real linear 
transformation consists of Jordan blocks (5.20) corresponding to real eigenvalues A; 
and pairs of Jordan blocks of the same order corresponding to complex-conjugate 
pairs of eigenvalues A; and ij. 

Let us see what this gives us for the classification of linear transformations of 
a real vector space Lg. Let us consider the simple example of the case dimLo = 2. 
By Theorem 5.19, the Jordan normal form of the linear transformation A of the 
complex space L can have one of the three following forms: 


a 0 a 0O Xn 0 
(a) (5 a (b) ({ ) () (( 3): 


5.4 Real Vector Spaces 175 


where a and £ are real, and A is a complex, not real, number, that is, A = a + ib, 
where i7 = —1 andb 40. 

In cases (a) and (b), as can be seen from the definition of the linear transformation 
A, the matrix of the transformation Ag already has the indicated form in some basis 
of the real vector space Lo. 

As we showed in Sect. 4.3, in case (c), the transformation “9 has in some basis 


the matrix 
a —b 
b aly’ 


Thus we see that an arbitrary linear transformation of a two-dimensional real vector 
space has in some basis one of three forms: 


a O a 0O a —b 
(a) & ae (b) ({ a (©) é 7) (5.30) 


where a, 6, a, b are real numbers and b ¥ 0. By formula (3.43), this implies that an 
arbitrary real square matrix of order 2 is similar to a matrix having one of the three 
forms of (5.30). 

In a completely analogous way, we may study the general case of linear transfor- 
mations in a real vector space of arbitrary dimension.! By the same line of argument, 
one can show that every real square matrix is similar to a block-diagonal matrix 


Ay. O. aee 
0 A as O 
A=|. 2. |, 
G0. sae Ae 


where A; is either a Jordan block (5.20) with a real eigenvalue A; or a matrix of even 
order having the block form 


A 0 © as ase 0 
Ko Ae a 25. tee. D 
0 E A; 

Aj = ; 
: .. Aj 0 
@ i de wes ve 


‘One may find a detailed proof in, for example, the book Lectures on Algebra, by D.K. Faddeev (in 
Russian) or in Sect. 3.4 of Matrix Analysis, by Roger Horn and Charles Johnson. See the references 
section for details. 


176 5 Jordan Normal Form 


in which the blocks A; and E are matrices of order 2: 


5.5 Applications* 


For a matrix A in Jordan normal form, it is easy to calculate the value of f(A), 
where f(x) is any polynomial of degree n. First of all, let us note that if the matrix 
A is in block-diagonal form 


A; O 0 
O Ao 0 
0 0 Ar 


with arbitrary blocks A;,..., A;, then 


fA 0 . Be 
HAD. 0 
f(A) = oR ae 
O°. 0, ees FU 


This follows immediately from the decomposition of the space Las L=L; ®@---® 
L,, a direct sum of invariant subspaces, and from the fact that a linear transformation 
with matrix A defines on L; a linear transformation with matrix A;. 

Thus it remains only to consider the case that A is a Jordan block, that is, 


7 0 O 0 

1 2 O 0 

Oo 1 2X : 
A=|. ; . ml ie (5.31) 

7 0 

0 0 1 id 


5.5 Applications* 177 


It will be convenient to represent it in the form A = AE + B, where 


00 0 0 
10 0 0 
01 0 : 
B= |, oo .|- (5.32) 
: 0 0 
0 0 1 0 


Let us now write down Taylor’s formula for a polynomial of degree n: 


f"@) 
2! 


2 Ey 
faty=f@e)+f@)y+ ae ee (5.33) 
We note that for the derivation of formula (5.33), we have to compute the binomial 
expansion of (x + y)*, k =2,...,n, and then, of course, use commutativity of mul- 
tiplication of numbers. If the commutative property did not hold, then we would not 
be able to obtain, for example, the expression (x + y)* = y* + 2xy +x”, but only 
(x+y)? =y?+ yx +xy +x?. Therefore, in formula (5.33), we may replace x and 
y by numbers, but not by arbitrary matrices, instead only those that commute. 

Let us substitute in formula (5.33) the arguments x = AE and y = B, since the 
matrices XE and B obviously commute. As is easily verified, for an arbitrary poly- 
nomial f (AE) = f(A)E, we obtain the expression 


1 ry (n) nN 
ray=foet fost Fp +..45 Mp 634) 
We now observe that in the basis e1,...,@, of the cyclic subspace generated by 


the principal vector e of grade m, the transformation 8 with B of the form (5.32) 
assumes the following form: 


41 fori<m—1, 
Beart NE Sm 
0 fori>m-—l. 


Applying the formula k times, we obtain that 


Be) = ei4k fori<m—k, 
0 fori >m-—k. 


178 5 Jordan Normal Form 


From this, it is clear that the matrix B* has the following very simple form: 


i it dae: teas pee eae. deo 
1 0 

Be—|9 1 
0 0 


In order to describe this in words, we shall call the collection of elements a;; in the 
matrix A = (a;;) with i = j the main diagonal, while the collection of elements qj; 
with i — j =k (where k is a given number) forming a diagonal parallel to the main 
diagonal will be called the diagonal lying k steps from the main diagonal. Thus in 
the matrix B*, the diagonal lying k steps from the main diagonal contains all 1’s, 
while the remaining matrix entries are zero. 

Formula (5.34) now gives for a Jordan block A of order m the expression 


~o 0 0 --» 0 0 
71 £0 0 -- 0 0 
2) #1 go 0 
haa=f (5.35) 
Pm-2 Pm-3 he “ee 40) 0 
QPm-1 Pm-2 Pm—3 *** Pl GPO 


where gy, = f)(A)/k!, that is, the numbers gx are the coefficients in the Taylor 
expansion (5.34). 

Let us look at a very simple example. Suppose we wish to raise a matrix A of or- 
der 2 to a very high power p (for example, p = 2000). To perform such calculations 
by hand seems hopeless. But the theory that we have constructed proves here to be 
very useful. Let us find an eigenvalue of the linear transformation A with matrix A, 
that is, a root of the second-degree trinomial |A — FE |. Here two cases are possible. 


Case 1. The trinomial |A — AE| has distinct roots A; and 42. We can easily find the 
associated eigenvectors e; and e2, for which 


(A — A1€)(e1) =9, (A — A2€)(e2) = 0. 


As we know, the vectors e; and e2 are linearly independent, and in the basis e1, e2, 


the transformation “A has the diagonal matrix (7 M i" If C is the transition matrix 


5.5 Applications* 179 


from the original basis in which the transformation A has matrix A to the basis 


€1, €2, then 
_p-1fA1 0 
A=C CG ho C, (5.36) 
whence is easily obtained for any p (as large as desired), the formula 
(a7 0 
Pp 1/1 
AP=C ( 0 i) C. (5.37) 


Let us now consider the second case. 


Case 2. The trinomial |A — A E| has a multiple root 4 (which therefore must be real). 
Then the Jordan normal form of the matrix A has the form of a single block ( :) or 
(G ae In the latter variant, the Jordan normal form of the matrix is equal to AF, and 
therefore the matrix A is also equal to AE (this follows, for example, from the fact 
that if in some basis, a linear transformation has the matrix 7 £, then it will have the 
same matrix in every other basis as well). Thus in this last variant we are dealing 
with the previous case, in which A; = A2 = A, and the calculation of A? is obtained 
by formula (5.37), where we have only to substitute 4; and A> for A. It remains to 
consider the first variant. For a Jordan block '& “ae by formula (5.35), we obtain 


2A 0\? ( a O 
i AY ~\par? aes 
If e,, e2 are vectors such that 


(A —AE)(e1) £9, €2 = (A—AE)(E1), 


then in the basis e1, e2, the matrix of the transformation A is in Jordan normal form. 
We denote by C the transition matrix to this basis, and using the transition formula 


pte 
sect(* %Ye 


nif #0 
AP =C (ar we (5.38) 


we obtain 


Formulas (5.37) and (5.38) solve our problem. 

We can now apply the same ideas not only to polynomials, but to other functions, 
for example those given by a convergent power series. Such functions are called 
analytic. To do this, we need the concept of convergence of a sequence of matrices. 
Let us recall that the notion of convergence for a sequence of square matrices of 
a given order with real coefficients was defined earlier, in Sect. 4.4. Moreover, in 
that same section, we introduced on the set of such matrices the metric r(A, B), 
after converting it to a metric space, on which the notion of convergence is defined 


180 5 Jordan Normal Form 


automatically (see p. xvii). It is obvious that the metric r(A, B) defined by formulas 
(4.36) and (4.37) is also a metric on the set of square matrices of a given order with 
complex coefficients, and therefore transforms it into a metric space. 

With this definition, the convergence of a sequence of matrices A“ = a) 
k=1,2,..., toa matrix B = (b;;) means that at, — bj; for k > oo for all i, j. 
In this case, we write AW —> B for k > © or limgsoo A“ = B. The matrix B 
is called the limit of the sequence AM, k=1,2,.... Similarly, we can define the 
limit of a family of matrices A(h) depending on a parameter h assuming values 
that are not necessarily natural numbers (as was the case for a sequence), but real 
values, and approaching an arbitrary value ho. By definition, lim;.p, A(h) = B if 
limp+pyr(A(h), B) = 0. In other words, this means that limp_.p, aj; (2) = bi; for 
all i, j. 

Just as in the case of numbers, once we have the notion of convergence of a se- 
quence of matrices, it is possible to talk about the convergence of series of matrices. 
Without any alteration, we can transfer theorems on series known from analysis to 
series of matrices. Let the function f(x) be defined by the power series 


f(x) a9 Fax tee + agxk fee, (5.39) 
Then by definition, 
f(A) =agE +ajA+---+a,A*+---. (5.40) 


Suppose the power series (5.39) converges for |x| <r and the matrix A is in the 
form of a Jordan block (5.31) with eigenvalue 4, of absolute value less than r. Then, 
examining the sum of the first k terms of the series (5.40) and passing to the limit 
k — o, we obtain that the series (5.40) converges, and for f(A), formula (5.35) 
holds. If we now take a matrix A’ similar to some Jordan block A, that is, related 
to it by A’ = C7! AC, where C is some nonsingular matrix, then from the obvious 
relationship (C~!AC)* = C~! AKC, we obtain from (5.40) that 


F(A) = C7 (aE +a A +--+ aA +---)C=C! f(A)C. (5.41) 


Formulas (5.35) and (5.41) allow us to compute f(A) for any analytic function 
J (x). Using results from analysis, we can extend the notion of functions of matrices 
to a wider class of functions (for example, to continuous functions with the help of 
the theorem on uniform approximation of continuous functions by polynomials). 
However, we shall not address these questions here. 

In applications, of especial importance are exponentials of matrices. We recall 
that the exponential function of a number x can be defined by the series summation 


x—] dL 2 1 k 
e=l4+xt+ex*te--+—x*4+---, (5.42) 


5.5 Applications* 181 


which, as proved in a course in analysis, converges for all real or complex num- 
bers x. According to this, the exponential of a matrix A is defined by the series 


1 2 1 k 
mbt Ad A tet AP aes, (5.43) 


which converges for every matrix A with real or complex entries. 
Let us verify that if matrices A and B commute, then a basic property of the 
numerical exponential function is transferred to the matrix exponential function: 


oh e8 = eAtB, (5.44) 


Indeed, substituting into the left-hand side of (5.44) the expressions (5.43) for eA 
and e® , removing parentheses, and collecting like terms, we obtain 


1 1 1 1 
A,B _ OAD PAB dee =~ p24. pp... 
eve =(E+A4 5A ae + )(Era+ 52 ag ) 


1 1 
= E+ (A+b)+ (5a? +AB+ 28") 


Le | 2B Pap |B 
a Cae Ti Bary foe 


=E+(ATB)+ SAF B+ SATB E 
which coincides with the expression (5.43) for e4*+. As justification for the gener- 
alization made above, it is necessary to note that first of all, as is known from anal- 
ysis, for the corresponding exponential function (5.43), the numeric series (5.42) 
converges absolutely on the entire real axis (this allows the terms to be summed 
in arbitrary order), and second, matrices A and B commute (without this, this last 
generalization would be impossible, which we know by virtue of what we discussed 
earlier on page 177). 
In particular, from (5.44) follows the important relationship 


eAGts) — At As (5.45) 


for all numbers f and s and every square matrix A. From this, it is easy to derive 
that 


f Oat = AeA! (5.46) 
dt 
(understanding that differentiation of the matrix function is to be taken element- 
wise). 
Indeed, by the definition of differentiation, 
eAtt+h) = eAt 


d . 
—e™ = lim ——————_., 
dt h>0 h 


182 5 Jordan Normal Form 


while from (5.45), it follows that 
eAlt+h) _ eAt eAh eAt _ At eAh _E a 


h ~ h ~~ h 


Finally, from (5.43) we easily obtain the equality 
He eg 1 2 1 k 
lim ———— = lim h (Ah) + — (AA) +---+ —(Ah)* +--- ) =A. 
h>0 2! k! 


All these considerations have numerous applications in the theory of differential 
equations. Let us consider a system of n linear homogeneous differential equations 


— =) ajjx;, i=1,...,n, (5.47) 


where a;; are certain constant coefficients and x; = x;(¢) are unknown differentiable 
functions of the variable t. Similarly to what was done earlier for systems of linear 
algebraic equations (Example 2.49, p. 62), the system of linear differential equa- 
tions (5.47) can also be written down compactly in matrix form if we introduce the 
column vectors 


x1 dx\/dt 


Xn dx, /dt 


and a square matrix of order n consisting of the coefficients of the system: A = (a;;). 
Then system (5.47) can be written in the form 


“= Ax. (5.48) 


The number n is called the order of this system. 

For any constant vector xo, let us consider the vector x(t) = e4’x9, depending on 
the variable t. This vector satisfies the system (5.48). Indeed, for arbitrary matrices 
A(t) and B (possibly rectangular, provided that the number of columns of A(t) 
coincides with the number of rows of B), if only the matrix B is constant, one has 
the equality 

d 


7 (AWB) = 


after which it remains to use relationship (5.46). Similarly, for arbitrary matrices 
A(t) and B, where B is constant and the number of columns of B coincides with 
the number of rows of A(t), we have the formula 


dA(t) 
dt 


B, 


Z (BA(t)) =B 


dA(t) 
i ae (5.49) 


d 


5.5 Applications* 183 


Since with t = 0, the matrix e4’ equals EF, the solution x(t) = e“' xq satisfies the 
initial condition x(0) = xo. But the uniqueness theorem proved in the theory of 
differential equations asserts that for a given xg, such a solution is unique. Thus we 
may obtain all solutions of the system (5.48) in the form e4'xo if we consider the 
vector Xq not as fixed, but as taking all possible values in a space of dimension n. 

Finally, it is also possible to obtain an explicit formula for the solutions. To this 
end, let us make a linear substitution of variables in the system of equations (5.48) 
according to the formula y = C~!x, where C is a nonsingular constant square ma- 
trix of order n. Then taking into account relationships (5.49), (5.48), and x = Cy, 
we obtain 


= Fc ARS (C~'AC)y. (5.50) 
Formula (5.50) shows that the matrix A of a system of linear differential equations 
under a linear replacement of variables changes according to the same law as the 
matrix of a linear transformation under a suitable change of basis. In accord with 
what we have done in previous sections, we may choose as C a matrix with whose 
help, the matrix A is converted to Jordan normal form. As a result, the system (5.48) 
can be rewritten in the form 


dy 
—=A'y, 551 
a y (5.51) 


where the matrix A’ = C~! AC is in Jordan normal form. 


Let 
a Do se © 
O° Ag we 
A=]. ae . |. (5.52) 
0 O -- A, 


where the A; are Jordan blocks. Then system (5.51) is decomposed into r systems 


dy; 
dt 


=Aiy;,, i=l,..r 


and for each of these, we can express the solution in the form eit a ) and find the 
matrix e4! from the relationship (5.35). Here f (x) = e*’, and consequently, 


(k) a ae Ut 
f = Te = te Qh=—e".~ 
x 


184 5 Jordan Normal Form 


This implies that for blocks A; of the form (5.31) of order m, formula (5.35) gives 
us 


1 0 0 O 
t 1 0 0 O 
e t 1 : 
toe] . a ts, “te Bile (5.53) 
pn-2 pn-3 I 0 


(m2)! (m-—3y! 
pn pin-2 pin-3 


(mall Gad! ma)! t oi 


This implies that the solutions of the system (5.48) can be decomposed into series 
whose lengths are equal to the orders of the Jordan blocks in the representation 
(5.52), and for a block of order m, all solutions of the given series can be expressed 
as linear combinations (with constant coefficients) of the functions 


ae te”, ae pete. (5.54) 


It is easily verified that the collection of solutions of system (5.48) forms a vector 
space, where the addition of two vectors and multiplication of a vector by a scalar 
are defined just as were addition and multiplication by a scalar of the correspond- 
ing functions. The set of functions (5.54) forms a basis of the space of solutions 
of the system (5.48). In the theory of differential equations, such a set is called a 
fundamental system of solutions. 

In conclusion, let us say a few words about linear differential equations with real 
coefficients in the plane (nm = 2) (that is, assuming that in system (5.48), the matrix 
A and vector x are real). Here, we should distinguish four possibilities for the matrix 
A and roots of the polynomial |A — AF]: 


(a) The roots are real and distinct: (a and £). 

(b) There is a multiple root a (necessarily real) and A=aE. 

(c) There is a multiple root a, but AAaE. 

(d) The roots are complex conjugate: a + ib and a — ib (here i? = —1 and b 40). 


In each of these cases, the matrix A can be brought (by multiplication on the left 
by C7! and on the right by C, where C is some nonsingular real matrix) into the 
following normal forms: 


a O a 0O a O a —b 
~ 6) © 6) © 6) #62 


The solution x(t) of the associated differential equation is obtained in the form 
x(t) = e“'xo, where x9 = () is the vector of the original data. Further, we can 
use formula (5.53), considering that the matrix A of the system has the normal form 
(a), (b), (c), or (d). Here in cases (a)—(c), we will obtain 


cat cat 
(a) 2) = (Sct). (b) “n= (‘ a: (5.55) 


ec, 


5.5 Applications* 185 


fe 0 a ce"! 
(c) x= (oe ) Oo) = oe spaett }* (5.56) 
In case (d), we obtain x(t) = ad nn where A = ( par In Example 4.2 


(p. 134) we established that A is the matrix of a linear transformation of the plane 
C with complex variable z that multiplies z by the complex number a + ib. This 
means, by the definition of the exponential function, that e4’ is the matrix of multi- 
plication of z by the complex number e@+!)", By Euler’s formula, 


e(atit — 6! (cos bt +i sinbt) = p+iq, 


where p = e“ cos bt and g = e sinbt. Thus we obtain a linear transformation of 
the real plane C with complex variable z that multiplies each complex number z € C 
by the given complex number p + iq. As we saw in Example 4.2, the matrix of such 
a linear transformation has the form (4.2). Multiplying it by the column vector xo of 
the original data and substituting the expressions p = e“' cosbt and q = e“ sinbt, 
we obtain our final formula: 


P —q Cl at (C1 CoS bt — cz sinbt 
UP 2) € Dp ) : ) _* ic sin bt + c2 cos ) : 627) 

The plane of variables (x1, x2) is called the phase plane of the system (5.48) 
for n = 2. Formulas (5.55)-(5.57) define (in parametric form) certain curves in the 
phase plane, where to each pair of values cj, co there corresponds in general a curve 
passing through the point (c), c2) of the phase plane for t = 0. These oriented curves 
(the orientation is given by the direction of motion corresponding to an increase in 
the parameter f) are called phase curves of system (5.48), and the collection of 
all phase curves corresponding to all possible values of c1, c2 is called the phase 
portrait of the system. Let us pose the following question: What does the phase 
portrait of the system (5.48) look like in cases (a)—(d)? 

First of all, we note that among all solutions x(t) there is always the constant 
x(t) = 0. It is obtained by substituting in formulas (5.55)—(5.57) the initial values 
C| = C2 = 0. The phase curve corresponding to this solution is simply the point 
x1 = x2 = 0. Constant solutions (and their corresponding phase curves, points in the 
phase plane) are called singular points or equilibrium points or fixed points of the 
differential equation.” Similarly, just as the study of a function usually begins with 
a search for its extreme points, so a study of a differential equation usually begins 
with a search for its singular points. 

Are there singular points of system (5.48) other than xj = x2 = 0? Singular 
points are the constant solutions of a system of equations, and since the derivative 
of a constant solution is identically equal to zero (that is, the left-hand side of sys- 
tem (5.48) is identically zero), this means that the right-hand side of system (5.48) 
must also be identically equal to zero. Therefore, singular points are precisely the 


?This name comes from the fact that if at some moment in time, a material point whose motion is 
described by system (5.48) is located at a singular point, then it will remain there forever. 


186 5 Jordan Normal Form 


solutions of the system of linear homogeneous equations Ax = 0. If the matrix A is 
nonsingular, then the system Ax = 0 has no solutions other than the null solution, 
and therefore, system (5.48) has no singular points other than x; = x2 = 0. If the 
matrix A is singular and its rank is equal to 1, then system (5.48) has an infinite 
number of singular points lying on a line in the phase plane. But in the case that the 
rank of the matrix A is equal to 0, all points of the phase plane are singular points. 

In the sequel, we will consider that the matrix A is nonsingular and examine what 
sorts of phase portraits they correspond to in the cases (a)—(d) presented above. In 
all the figures, the x-axis corresponds to the variable x;, while the y-axis represents 
the variable x2. 

(a) The roots a and £ are real and distinct. In this case, there are three possibili- 
ties: a and 6 have different signs, both are negative, or both are positive. 

(a.1) If w and £ have different signs, then a singular point is called a saddle. For 
definiteness, let us assume that a < 0 and 6 > 0. To the initial value c; 4 0, cz = 0 
there corresponds the solution x; (t) = cje™’, x2(t) = 0, passing through the point 
(c,,0) at t = 0. The associated phase curve is the horizontal ray x; > 0, x2 = 0 (if 
cy > 0) or x; <0, x2 = 0 (if c) < 0) such that the direction along the curve with 
increasing f is toward the singular point x; = x2 = 0. 

Similarly, to the initial point cy = 0, co 4 0 corresponds the solution x, (t) = 0, 
x(t) = coe?", passing through the point (0, c2) at t = 0. The associated phase curve 
is the vertical ray x1 = 0, x2 > O (if cz > 0) or x1 =0, x2 < 0 (if co < 0) such 
that the direction along the curve for increasing f is away from the singular point 
Xj =x2=0. 

Thus there are two phase curves asymptotically approaching the singular point 
as t —> +00 (they are called stable separatrices), and two curves approaching it 
for tf > —oo (they are called unstable separatrices). Let us make one crucial ob- 
servation: from the fact that e® — 0 for t > +oo and e® — 0 for t > —on, it 
follows that stable and unstable separatrices approach a saddle arbitrarily closely as 
t — +00 and t + —oo respectively but never reach it in finite time. 

The stable and unstable separatrices of a saddle partition the phase plane into 
four sectors. In our case (in which the matrix of system (5.48) is in Jordan form), the 
separatrices lie on the coordinate axes, and therefore, these sectors coincide with the 
Cartesian quadrants. Let us see how the remaining phase curves behave with respect 
to the initial values cj 4 0, cz 4 0. We observe first that if the initial point (cj, c2) 
lies in any of the four sectors, then after passing through it for t = 0, the phase curve 
remains in that sector for all values of t. This follows obviously from the fact that 
the functions x(t) = cje® and x(t) = cye"" are of fixed sign. 

For definiteness, let us consider the first quadrant c, > 0, c2 > 0 (the other cases 
can be obtained from this one by a symmetry transformation with respect to the x- 
or y-axis or with respect to the origin). Let us raise the function x; (t) = cje™ to the 
B power, and the function x2(t) = c2e*' to the w power. After dividing one by the 
other and canceling the factor e*?’, we obtain the relationship 


—=— =C¢, (5.58) 


5.5 Applications* 187 


saddle stable node unstable node 


Fig. 5.1 Saddle and nodes 


where the constant c is determined by the initial values c;, c2. Since the numbers 
a and 6 have opposite signs, the phase curve in the plane (x1, x2) corresponding 
to this equation has a form similar to a hyperbola. This phase curve passes at some 
positive distance from the singular point x; = x2 = 0, asymptotically approaching 
one of the unstable separatrices as t > +00 and to one of the stable separatrices as 
t —> —oo. Such phase curves are said to be of hyperbolic or saddle type. 

Thus in the case of a saddle, we have two stable separatrices approaching the 
singular point as tf > +00 and two unstable separatrices approaching it as t ~ —oo, 
and also an infinite number of saddle-type phase curves filling the four sectors into 
which the separatrices divide the phase plane. The associated phase portrait is shown 
in Fig. 5.1. 

(a.2) If w and 6 have the same sign, then a singular point is called a node. More- 
over, if w and f are negative, then the node is said to be stable, while if a and 6 
are positive, the node is unstable. The reason for this terminology will soon become 
clear. 

For definiteness, we will restrict our examination to stable nodes (unstable nodes 
are studied similarly), that is, we shall assume that the numbers @ and £ are negative. 
As in the case of a saddle, the phase curve corresponding to the initial value c; 4 0, 
c2 = 0 is the horizontal ray x; > 0, x2 = 0 (if cy > 0) or x1 <0, x2 = 0 (if cy < 0) 
such that the direction along the curve for increasing ¢ is toward the singular point. 
The phase curve corresponding to the initial value c) = 0, cz 4 0 is the vertical ray 
x1 =0, x2 > 0 Cif cz > 0) or x} = 0, x2 < 0 Cif c2 < 0) such that the direction along 
the curve for increasing ¢ is also toward the singular point. 

As in the case of a saddle, it is clear that if the initial point (c;, cz) lies in one 
of the four quadrants, then the phase curve passing through it for t = 0 remains in 
that quadrant for all values of t. Let us consider the first quadrant cy > 0, c2 > 0. 
Proceeding as we did in the case of a saddle, we again obtain the equation (5.58). But 
now the numbers a and 6 have the same sign, and the phase curve corresponding 
to this equation has quite a different form from that in the case of a saddle. After 
a transformation of (5.58), we obtain the exponential function x; = cl/ Bx /P ig 
a > B, then the exponent a/f is greater than 1, and the graph of this function is 
similar to a branch of the parabola x; = re However, if a < 6, then the exponent 
a/B is less than 1, and the graph of the function looks like a branch of the parabola 


188 5 Jordan Normal Form 


stable stable unstable 
dicritcal node Jordan node Jordan node 


Fig. 5.2 Dicritical and Jordan nodes 


x2= a? Thus in the case of a stable node, all the phase curves approach the singular 
point as t — +00, while for tf > —oo, they move away from it (for an unstable node 
we must exchange the positions of ++-oo and —oo). Such phase curves are called 
parabolic. Phase portraits of stable and unstable nodes are depicted in Fig. 5.1. 

It is now possible to explain the terminology stable and unstable. If a material 
point was located at an equilibrium point that was a stable node and was brought 
out from that point by some external action, then moving along the curve depicted 
in the phase portrait, it will strive to return to that position. But if it was an unstable 
node, then a material point brought out from an equilibrium point not only would 
not strive to return to that position, but on the contrary, it would move away from it 
with exponentially increasing speed. 

(b) If a matrix A is similar to the matrix wE, then a singular point is called a 
dicritical node or bicritical node. Proceeding in the same way as before, we obtain 
the relationship (5.58) with 6 = a, from which follows the equation x; /x2 = c1/c2. 
All the phase curves are rays with origin at x; = x2 = 0. Moreover, if a < 0, then 
motion along them as t + +00 proceeds toward the equilibrium point x; = x2 = 0, 
while if ~ > 0, then away from it. Thus in the case a < 0 (a > 0), we have a stable 
(unstable) dicritical node. The phase portrait of a stable dicritical node is depicted 
in Fig. 5.2. In the case of an unstable dicritical node, it is necessary only to change 
the directions of the arrows to their opposite. 

(c) If the solution to the equation is given by formula (5.56), then a singular point 
is called a Jordan node. If a < 0, then the Jordan node is stable, and if a > 0, then 
it is unstable. For c; 4 0, cz = 0, we obtain two phase curves, namely the horizon- 
tal rays x1 > 0, x2 = 0 and x; <0, x2 =0, whose motion is in the direction of the 
singular point for a < 0 and away from the singular point for a > 0. In the inves- 
tigation of phase curves for cz #0, one must study the properties of the functions 
x1(t) = cye“ and xo(t) = (cyt + c2)e™ for c; > O and for c; <0. As a result, for a 
stable (unstable) Jordan node, one obtains the phase portrait depicted in Fig. 5.2. All 
the phase curves (except the two vertical rays) look like pieces of a parabola, each 
of which lies entirely either in the right or left half-plane and intersects the x-axis in 
a single point. 

(d) The roots are complex conjugates: a + ib and a — ib, where b ¥ 0. Here it is 
necessary to consider two cases: a 4 0 anda =0. 


5.5 Applications* 189 


stable focus unstable focus center 


Fig. 5.3 Foci and center 


(d.1) If a £0, then a singular point is called a focus. In order to visualize the 
behavior of phase curves given by formula (5.57), we observe that the vector x(t) is 
obtained from the vector x9 with coordinates (c;, cz) by rotating it through the angle 
bt and multiplying by e“’. Therefore, the phase curves are spirals that “wind” around 
the singular point x; = x2 = 0 as t > +00 (if a < 0) or as t — —ow (if a > 0). For 
a <Q and a > 0, a focus is said to be stable or unstable respectively. The direction 
of motion along the spirals (clockwise or counterclockwise) is determined by the 
sign of the number b. In Fig. 5.3 are shown phase portraits of a stable focus (a < 0) 
and an unstable focus (a > 0) in the case b > 0, that is, the case in which the motion 
along the spirals is counterclockwise. 

(d.2) If a = 0, then the singular point x; = x2 = 0 is called a center. Relationship 
(5.57) defines in this case a rotation of the vector x9 through the angle br. The 
phase curves are concentric circles with common center x; = x2 = 0 along which 
the motion is either clockwise or counterclockwise according to the sign of the 
number b. The phase portrait of a center (for the case b > 0) is shown in Fig. 5.3. 


Chapter 6 
Quadratic and Bilinear Forms 


6.1 Basic Definitions 


Definition 6.1 A quadratic form in n variables x),...,xX, is a homogeneous 
second-degree polynomial in these variables. Therefore, only terms of degree two 
enter into this polynomial; that is, the terms are monomials of the form gj; x;x; for 
all possible values of 7, 7 = 1,...,, and so the polynomial has the form 


W(X1,-.-,Xn) = > Qi jXiXj. (6.1) 


ig=l 


We note that in expression (6.1), there are like terms, such as x;xj = xjx;. We 
shall decide later how to deal with them. 

Of course, every quadratic form (6.1) can be viewed as a function of the vector 
x= xe) +---+xXyen, where e1,..., @n is some fixed basis of the vector space L of 
degree n. We shall write this as 


wx) = D> gijxix;. (6.2) 


i=l 


The given definition of quadratic form obviously is compatible with the more 
general definition of form of arbitrary degree given in Sect. 3.8 (see p. 127). We 
recall that in that section, a form of degree k was defined as a function F(x) of the 
vector x € L, where F(x) is written as a homogeneous polynomial of degree k in 
coordinates x;,..., 2X, in some (and hence any) basis of this vector space. Thus for 
k = 2, we obtain the above definition of quadratic form. 

By achange in coordinates, that is, by a choice of another basis of the space L, a 
quadratic form (x) will be written as previously in the form (6.2) with some other 
coordinates ¢j;. 

Quadratic forms have the property of being very similar to linear functions, and in 
the sequel, we shall unite the theory of quadratic forms with that of linear functions 
and transformations. The following notion will serve as a foundation for this. 


LR. Shafarevich, A.O. Remizov, Linear Algebra and Geometry, 191 
DOI 10.1007/978-3-642-30994-6_6, © Springer-Verlag Berlin Heidelberg 2013 


192 6 Quadratic and Bilinear Forms 


Definition 6.2 A function g(x, y) that assigns to two vectors x, y € Lascalar value 
is called a bilinear form on L if it is linear in each of its arguments, that is, if for 
every fixed Vy € L, the function g(x, y) as a function of x is linear on L and for each 
fixed ¥ € L, the function g(¥, y) as a function of y is linear on L. 


In other words, the following conditions must be satisfied for all vectors of the 
space L and scalars a: 


p(x, +x2,y)=9(%1,y) + 9(%2, y), 
y(ax, y)=ag(x, y), 


(6.3) 
P(X, Y, + Y2) =O(X, ¥1) + P(X, Yo), 


y(x,ay) =ag(x, y). 


If the space L consists of rows, we have a special case of the notion of multilinear 
function, which was introduced in Sect. 2.7 (for m = 2). 
If e,,..., @, 18s some basis of L, then we can write 


X= XC +s +ANen, yH=ypert+::++ynen, 


and using equations (6.3), we obtain a formula that expresses (in the chosen basis) 
the bilinear form g(x, y) in terms of the coordinates of the vectors x and y: 


n 
g(x, y= > GijxXiyj, Where gj; = 9(e;, e;). (6.4) 
i,j=l 


In this case, the square matrix ® = (g;;) is called the matrix of the bilinear form @ 
in the basis e1,..., @n. In the case that x and y are rows, this formulation represents 
a special way of writing an arbitrary multilinear function as introduced in Sect. 2.7 
(Theorem 2.29). 

The relationship (6.4) shows that the value of g(x, y) can be expressed in terms 
of the elements of the matrix ® and the coordinates of the vectors x and y in the 
basis @1,...,@n,, which means that a bilinear form, as a function of the arguments x 
and y, is completely defined by its matrix ®. This same formula shows that if we 
replace the argument y in the bilinear form g(x, y) by x, where x = (%1,..., Xn), 
we obtain the quadratic form w(x) = g(x,x), and moreover, any quadratic form 
(6.1) can be obtained in this way; to do so, we need only choose a bilinear form 
g(x, y) with matrix @ = (g;;) satisfying the condition g(e;,e;) = gj, where gj; 
are the coefficients from (6.1). 

It is easily seen that the set of bilinear forms on a vector space L is itself a vector 
space if we define on it in a natural way the operations of addition of bilinear forms 
and multiplication by a scalar. Clearly, the null vector in such a space is the bilinear 
form that is identically equal to zero. 

The connection between the notion of bilinear form and that of linear transfor- 
mation is based on the following result, which uses the notion of dual space. 


6.1 Basic Definitions 193 


Theorem 6.3 There is an isomorphism between the space of bilinear forms g on 
the vector space L and the space £(L, L*) of linear transformations A:L—> L*. 


Proof Let g(x, y) bea bilinear form on L. Let us associate with it the linear transfor- 
mation A :L— L* as follows. By definition, 4 should assign to a vector y € La lin- 
ear function w(x) on L. We shall make this assignment by setting w(x) = g(x, y). 
The verification that the transformation A thus defined is linear is trivial. 

It is equally trivial to verify that the correspondence g b> A is a bijection. We 
shall simply point out the inverse transformation of the set £(L,L*) into the set of 
bilinear forms. Let 4 be a linear transformation from L to L* that to each vector 
x € L assigns the linear function A(x) € L*. This function takes the value A(x)(y) 
on the vector y, which we shall denote by g(x, y). Using the notation established in 
Sect. 3.7 (p. 125) and keeping in mind that in this situation, M=L*, we may write 
g(x, y) = (x, A(y)) for arbitrary vectors x, y EL. 

Finally, it is completely obvious that the constructed mapping g b> is an iso- 
morphism of vector spaces, that is, it satisfies the conditions g} + 92 A, + A2 
and Ag +> A.A, where gj > A; and A is an arbitrary scalar. 


It follows from this theorem that the study of bilinear forms is analogous to that 
of linear transformations L — L (although somewhat simpler). In mathematics and 
physics, a special role is played by two particular types of bilinear form. 


Definition 6.4 A bilinear form g(x, y) is said to be symmetric if 


g(x,y) =9ly.x), (6.5) 
and antisymmetric if 


g(x, y)=—Gly, x), (6.6) 


for all vectors x, y EL. 


We encountered special cases of both these concepts in Chap. 2, when the vectors 
x and y were taken to be rows of numbers. 
If following Theorem 6.3, we express the bilinear form g(x, y) in the form 


v(x, y) = (x, A(y)) (6.7) 


with some linear transformation A :L— L*, then the symmetry condition (6.5) 
indicates that (x, A(y)) = (y, A(x)). Since (y, A(x)) = (x, A*(y)), where A* : 
L** —> L* is the linear transformation dual to (see p. 125), then it can be rewritten 
in the form (x, A(y)) = (x, A*(y)). Since this relationship must be satisfied for all 
vectors x, y € L, it can be rewritten in the form “= A*. Note that in view of the 
equality L** = L, both A and A* are transformations from L to L*. Similarly, the 
asymmetry condition (6.6) of the bilinear form g(x, y) can be written in the form 
A=—A”*, 


194 6 Quadratic and Bilinear Forms 


Let us note that it suffices to verify the symmetry condition (6.5) and antisymme- 
try condition (6.6) for vectors x and y belonging to some particular basis e1,..., @n 
of the space L. Indeed, if this condition is satisfied for vectors in the basis e1,..., @n, 
that is, for example, in the case of symmetry, the equations y(e;, e;) = (e;, e;) are 
satisfied for alli, 7 = 1,...,7, then from formula (6.4), it follows that the condition 
(6.5) is met for all vectors x, y € L. Recalling the definition of a matrix of a bilinear 
form, we see that the form g is symmetric if and only if its matrix ® is symmetric 
in some basis of the space L (that is, = }*). Similarly, the antisymmetry of the 
bilinear form ¢ is equivalent to the antisymmetry of in some basis (6 = —@*). 

The matrix @ of a bilinear form depends on the basis e),...,@,. We shall now 
investigate this dependence. Here, we shall use the formula (3.38) for changing 
coordinates that we derived in Sect. 3.4, and moreover, our reasoning will be similar 
to what we used then in deriving this formula. 

First of all, let us write down the relationship (6.4) in a more compact matrix 
form. To this end, we observe that for 


yl 
rows X = (X1,...,%) and columns[y]=] : |, 


Yn 


the sum in formula (6.4) can be rewritten in the following form: 


n n n n n 
2 QijXi Yj = oe (> aim) = ae where z; = >> Gij9;- 
j=l j=) 


tj=1 | i=] 
By the rule of matrix multiplication, we obtain the expression 


n Z1 
d= vijxiyj =xlz], where [z]=] : | = Oly). 
a | Zn 


This means that we now have 


n 


Y> gijxiyj =x Oly). 
i,j=!l 


Let us note that by similar arguments, or by simply taking the transpose of both 
sides of the previous equality (on the left-hand side of which stands a scalar, that is, 
a matrix of type (1, 1), which is invariant under the transpose operation), we obtain 
a similar relationship 


n 


~ PijXiy; = yP* [x]. 
i,j=l 


6.1 Basic Definitions 195 


Thus if in some basis e1,..., @,, the matrix of the bilinear form ¢ is equal to ®, 
while the vectors x and y have coordinates x; and y;, then we have the following 
formula: 


g(x,y) =x@[y]. (6.8) 
Similarly, for another basis e’, ..., e/,, we obtain the equality 
g(x, y)=x'e'[y'], (6.9) 
where ®’ is the matrix of the bilinear form ¢g, while x/ and y; are the coordinates of 
the vectors x and y in the basis e.: seh e. 
Let C be the transition matrix from the basis é\; nea e, to the basis e),..., @y. 


Then by the substitution formula (3.36), we obtain the relationships x = x'C* and 
Ly] = CLy’]. Substituting these expressions into (6.8), taking into account formula 
(6.9), we obtain the identity 


x'C*@C[y’] ~ x'@'Ty’], 


which is satisfied for all x’ and [y’]. From this, it follows that the matrices ® and 
®’ of the bilinear form g in these bases are related by the equality 


&'’=C*@C. (6.10) 


This is the substitution formula for the matrix of a bilinear form for a change of 
basis. 

Since the rank of a matrix is invariant under multiplication on the left or right 
by a nonsingular square matrix of appropriate order (Theorem 2.63), it follows that 
the rank of the matrix ® is the same as that of the matrix ©’ for any transition 
matrix C. Thus the rank r of the matrix of a bilinear form does not depend on the 
basis in which the matrix is written, and consequently, we may call it simply the rank 
of the bilinear form q. In particular, if r =n, that is, if the rank coincides with the 
dimension of the vector space L, then the bilinear form ¢ is said to be nonsingular. 

The rank of a bilinear form can be defined in another way. By Theorem 6.3, to 
every bilinear form g there corresponds a unique linear transformation A:L— L*, 
and the connection between the two is laid out in (6.7). It is easily verified that if 
we choose in the spaces L and L* two dual bases, then the matrices of the bilinear 
form g and the linear transformation A will coincide. This shows that the rank 
of the bilinear form is the same as the rank of the linear transformation A. From 
this we derive that in particular, the form g is nonsingular if and only if the linear 
transformation A: L— L* is an isomorphism. 

A given quadratic form w can be obtained from different bilinear forms ¢; this 
is related to the presence of similar terms in the expression (6.1) for a quadratic 
form, about which we spoke above. In order to obtain uniqueness and agreement 
with the properties of linearity, we shall proceed not as in secondary school, where, 
for example, one writes the sum of terms a12x1x2 + a21x2x1 = (412 + .a21)x1X2, but 
instead using a notation in which we do not collect like terms. 


196 6 Quadratic and Bilinear Forms 


Remark 6.5 (On elements of fields) Additional refinements in this section are di- 
rected at the reader who is interested in the case of vector spaces over an arbitrary 
field IK. Here we shall introduce a certain limitation that will allow us to provide 
a single account for the cases K = R, K = C, and all types of fields that we will 
be concerned with. Namely, in what follows we shall assume that K is a field of 
characteristic different from 2.! (We mentioned a similar limitation in the general 
concept of field on p. 83.) Using the simplest properties that can be derived from 
the definition of a field, it is easy to prove that in a field of characteristic different 
from 2, there exists for an arbitrary element a a unique element b such that 2b =a 
(where 2b denotes the sum b + b). We then set b = a/2, and so whenever a = 0, it 
follows that b = 0. 


Theorem 6.6 Every quadratic form W(x) on the space L over a field IK of charac- 
teristic different from 2 can be represented in the form 


w(x) =9(x,x), (6.11) 


where is a symmetric bilinear form, and moreover, for the given quadratic form 
w, the bilinear form 9 is unique. 


Proof By what we have said above, an arbitrary quadratic form y(x) can be repre- 
sented in the form 


w(x) =91(x,x), (6.12) 


where ¢ (x, y) is some bilinear form, not necessarily symmetric. Let us set 


_ gi(x, y) + gi(y, x) 
g(x. y= 5 ; 


It is clear that g(x, y) is a bilinear form, and moreover, it is already symmetric. 
From formula (6.12) follows the relationship (6.11), as asserted. 

We shall now prove that if relationship (6.11) holds for some symmetric bilinear 
form g(x, y), then g(x, y) is uniquely determined by the quadratic form w(x). To 
see this, let us calculate w(x + y). By assumption and the properties of the bilinear 
form ¢~, we have 


wixt+y=ox+y,x+y)=O(r,x)+o(y,y)+e(%, y)+o(y,x). (6.13) 


In view of the symmetry of the form ¢g, we have 


wxt+y=Vv~)t+¥Q) +2¢e, y), 


‘Fields of characteristic different from 2 are what are most frequently encountered. However, fields 
of characteristic 2, which we are excluding from consideration here, have important applications, 
for example in discrete mathematics and cryptography. 


6.1 Basic Definitions 197 
which implies that 


1 
g(x,y) = 5 (We +) — ve) — WO). (6.14) 


This last relationship uniquely determines a bilinear form g(x, y) associated with 
the given quadratic form w(x). 


With the same assumptions, we have the following result for antisymmetric 
forms. 


Theorem 6.7 For every antisymmetric bilinear form g(x, y) on the space L over a 
field K of characteristic different from 2, we have 


p(x,x) =0. (6.15) 


Conversely, if equality (6.15) is satisfied for every vector x € L, then the bilinear 
form Q(x, y) is antisymmetric. 


Proof If the form g(x, y) is antisymmetric, then transposing the arguments in 
the expression g(x,x) leads to the relationship g(x,x) = —g(x,x), and then 
2g(x,x) = 0, from which follows equality (6.15), since by the condition of the 
theorem, the field K has characteristic different from 2. Conversely, if g(x, x) = 0 
for every vector x € L, then this holds in particular for the vector x + y, that is, we 
obtain 


gxty,x+y)=o(x,x) +(x, y)+ oly, x) +e(y, y) =9. 


Since we have g(x, x) = y(y, y) = 0 by the hypothesis of the theorem, it follows 
that g(x, y) + g(y, x) = 0, which yields that the bilinear form g(x, y) is antisym- 
metric. 


Let us note that the way of writing the quadratic form w(x) in the form (6.11) 
established by Theorem 6.6, where g(x, y) is a symmetric bilinear form, shows us 
how to write similar terms in the representation (6.1). Indeed, if we have 


X= xXpep t+ + Ane, y=yrei t+ + nen, 
and g(x, y) is a bilinear form, then 
n 
(x, y= = PijXi Vj, 
i,j=1 


where 9; = y(e;, e;). The symmetry of the form g(x, y) implies that g;; = gj; for 
alli, 7 =1,...,n. Then the representation 


n 
VOre etn = > Gyre; 


i,j=l 


198 6 Quadratic and Bilinear Forms 


contains like terms 9; ;x;x; and gy j;x;x; fori A j. Then if i # j, the term with x;x; 
occurs in the sum twice: as gj ;x;x; and as yj;x;x;. Since ¢;; = g;;, then collecting 
like terms leads to this sum being written in the form 29; ;x;x;. 

For example, the coefficients of the quadratic form x + x1x2 + tA are given 
by 911 = 1, g22 = 1, and ¢)2 = g2) = 5: Such a way of writing things may seem 
strange at first glance, but as we shall soon see, it offers many advantages. 


6.2 Reduction to Canonical Form 


The main goal of this section is to transform quadratic forms into the simplest pos- 
sible form, called canonical. As in the case of the matrix of a linear transformation, 
canonical form is obtained by the selection of a special basis of the given vector 
space. Namely, the required basis must possess the property that the matrix of the 
symmetric bilinear form corresponding to the given quadratic form assumes diag- 
onal form in that basis. This property is directly connected to the important notion 
of orthogonality, which will be used repeatedly in this and subsequent chapters. We 
note that the notion of orthogonality can be formulated in a way that is well defined 
for bilinear forms that are not necessarily symmetric, but it can be most simply 
defined for symmetric and antisymmetric bilinear forms. In this section, we shall 
consider only symmetric bilinear forms. 

Thus let there be given on the finite-dimensional vector space L a symmetric 
bilinear form g(x, y). 


Definition 6.8 Vectors x and y are said to be orthogonal if p(x, y) =0. 


We observe that in light of the symmetry condition g(y, x) = g(x, y), the equal- 
ity p(x, y) = 0 is equivalent to y(y, x) = 0. This is true as well for antisymmetric 
bilinear forms. However, if we do not impose a symmetry or antisymmetry condi- 
tion on the bilinear form, then the vector x can be orthogonal to the vector y without 
y being orthogonal to x. This leads to the concepts of left and right orthogonality 
and some very beautiful geometry, but it would take us beyond the scope of this 
book. A vector x € L is said to be orthogonal to a subspace L’ C L relative to ¢ if it 
is orthogonal to every vector y € L’, that is, if p(x, y) =O forall ye’. 

It follows at once from the definition of bilinearity that the collection of all vec- 
tors x orthogonal to a subspace L’ with respect to a given bilinear form ¢ is itself a 
subspace of L. It is called the orthogonal complement of the subspace L’ with respect 
to the form ¢ and is denoted by (L’ re 

In particular, for L’ = L, the subspace (OF represents the totality of vectors x € L 
for which the equation g(x, y) = 0 is satisfied for all y € L. This subspace is called 
the radical of the bilinear form g(x, y). From the definition of a bilinear form, it 
follows at once that the radical consists of all vectors x € L such that 


g(x,e;)=0 foralli=1,...,n, (6.16) 


6.2 Reduction to Canonical Form 199 


where €1,..., @, is some basis of the space L. The equalities (6.16) are linear ho- 
mogeneous equations that define the radical as a subspace of L. If we write down 
the vector x in the chosen basis, that is, in the form x = xje] +---+x,e,, then in 
view of formula (6.4), we obtain from the equalities (6.16) a system of linear homo- 
geneous equations in the unknowns x), ...,X,. The matrix of this system coincides 
with the matrix ® of the bilinear form ¢ in the basis e1,...,@,. Thus the space 
(5 satisfies the conditions of Example 3.65 from Sect. 3.5 (p. 114). Consequently, 
dim(L)Z =n -—r, where r is the rank of the matrix of the linear system, that is, the 
rank of the bilinear form g. We therefore obtain the equality 


r=dimL — dim(L);. (6.17) 


Theorem 6.9 Let L’ C L be a subspace such that the restriction of the bilinear form 
g(x, y) to L’ is a nonsingular bilinear form. We then have the decomposition 


’ nyt 
L=U@(L)°. (6.18) 
Proof First of all, we note that by the conditions of the theorem, the intersection 
Ua Ve is equal to the zero space (0). Indeed, it consists of all vectors x € L’ 
such that g(x, y) = 0 for all y € L’, and hence only for the null vector, since by the 
condition, the restriction of g to the subspace L’ is a nonsingular bilinear form. Thus 
it suffices to prove that L’ + (L’ i = L. We shall present two proofs of this fact in 
order to demonstrate two different lines of reasoning used in the theory of vector 
spaces. 

First proof. We shall use the linear transformation A: L— L* constructed in 
Theorem 6.3 corresponding to the bilinear form g. Assigning to each linear function 
on L its restriction to the subspace L’ C L, we obtain the linear transformation & : 
L* —> (L’)*. If we apply in sequence the linear transformations A and 8, we obtain 
the linear transformation C = BA:L— (L’)*. 

The kernel L; of the transformation C consists of the vectors y € L such that 
y(x, y) =0 for all x € U, since by definition, g(x, y) = (x, A(y)). This implies 
that L; = (L’ ee Let us show that the image L2 of the transformation C is equal to 
the entire subspace (L’)*. We shall prove an even stronger result: an arbitrary vector 
u € (L’)* can be represented in the form u = C(v), where v € LU’. For this, we must 
consider the restriction of the transformation @ to the subspace L’. By definition, 
it coincides with the transformation A’ : L’ — (L’)* constructed in Theorem 6.3, 
which corresponds to the restriction of the bilinear form ¢ to L’. By assumption, the 
restriction of the form g to L’ is nonsingular, which implies that the transformation 
A’ is an isomorphism. From this, it follows in particular that its image is the entire 
subspace (L’)*. 

Now we shall make use of Theorem 3.72 and apply relationship (3.47) to the 
transformation @. We obtain dimL, + dimLz = dimL. Since L2 = (L’)*, it follows 
by Theorem 3.78 that dimL2 = dimL’. Recalling also that L; = (L’)+, we have fi- 
nally the equality 


dim(L’) + dimL! = dimL. (6.19) 


200 6 Quadratic and Bilinear Forms 


Since L’'N Ly, = (0), we conclude by Corollary 3.15 (p. 85) that L’ + Eas = 
vet re From Theorems 3.24, 3.38 and the relationship (6.19), it follows that 
VeWl)s=L 

Second proof. We need to represent an arbitrary vector x € L in the form x = 
u-+v, where u € L’ and ve (L’ ne This is clearly equivalent to the condition x —u € 
(L’)+, and therefore to the condition g(x —u, y)=0 for all y € L’. Recalling the 
properties of a bilinear form, we see that it suffices that the last equation be satisfied 
for vectors y =e;, i =1,...,r, where e1,...,e, is some basis of the space L’. 
In view of the bilinearity of the form ¢, our relationships can be written in the 
form 


g(u,e;)=9(x,e;) foralli=1,...,r. (6.20) 


We now represent the vector u as u = x1e; +---+~x;e,. Relationship (6.20) gives 
a system of r linear equations 


pler,e)x1 +--+ + p(e,, ex, = 9(%,e;), T=1,...,7, (6.21) 
with unknowns x1, ...,x;. The matrix of the system (6.21) has the form 
p(ei,e1) +++ g(e1,ér) 
a 
p(er,e1) ++: pler, er) 


But it is easy to see that ® is the matrix of the restriction of the bilinear 
form g to the subspace L’ written in the basis e1,...,e-. Since by assump- 
tion, such a form is nonsingular, its matrix is also nonsingular, and this implies 
that the system of equations (6.20) has a solution. In other words, we can find 
a vector u € L’ satisfying all the relationships (6.20), which proves our asser- 
tion. 


We shall now apply these ideas related to bilinear forms to the theory of quadratic 
forms. Our goal is to find a basis in which the matrix of a given quadratic form w(x) 
has the simplest form possible. 


Theorem 6.10 For every quadratic form w(x), there exists a basis in which the 
form can be written as 


w(x) = Axe +e + Anx?, (6.22) 


where x1, ...,Xn are the coordinates of the vector x in this basis. 


Proof Let g(x, y) be a symmetric bilinear form associated with the quadratic form 
w(x) by the formula (6.11). If w(x) is identically equal to zero, then the theorem 
clearly is true (for A; =--- =A, =0). If the quadratic form (x) is not identically 
equal to zero, then there exists a vector e; such that w(e,) 4 0, that is, p(e;, e;) 40. 
This implies that the restriction of the bilinear form @ to the subspace L’ = (e1) is 


6.2 Reduction to Canonical Form 201 


nonsingular, and therefore, by Theorem 6.9, for the subspace L’ = (e;) we have 
the decomposition (6.18), that is, L = (e1) ® (e1)¢- Since dim(e;) = 1, then by 
Theorem 3.38, we obtain that dim(e1)5 =n-l. 

Proceeding by induction, we may assume the theorem to have been proved for the 


space (e1 re Thus in this space there exists a basis e7, ..., €, such that g(e;,e;) =0 
for alli A j,i, j => 2. Then in the basis e1, ..., e, of the space L, the quadratic form 
w(x) can be written as (6.22) for some A1,..., An. 


We observe that one and the same quadratic form y can be of the form (6.22) in 
various bases, and in this case, the numbers 41, ..., A, might differ in various bases. 
For example, if in a one-dimensional space whose basis consists of one nonzero 
vector e, we define the quadratic form w by the relation y(xe) = x7, then in the 
basis consisting of the vector e’ = Ae, A £0, it can be written as y(xe’) = (Ax)?. 

If in a certain basis a quadratic form can be written as in (6.22), then we say that 
in that basis, it is in canonical form. Theorem 6.10 is called the theorem on reducing 
a quadratic form to canonical form. From what we have said above, it follows that 
reducing a quadratic form to canonical form is not unique. 

If in the basis e1,...,e, of the space L, the quadratic form w(x) has the form 
established in Theorem 6.10, then its matrix in this basis is equal to 


i Oo ae 0 
O de. sae 0 

w=]... . (6.23) 
Oo: we Fy 


It is clear that the rank of the matrix W is equal to the number of nonzero values 
among A1,..., An. AS we saw in the previous section, the rank of the matrix W (that 
is, the rank of the quadratic form y(x)) does not depend on the choice of basis in 
which the matrix W is written. Therefore, this number is the same for every basis 
for which Theorem 6.10 holds. 

It is useful to write down the results we have obtained in matrix form. We may 
reformulate Theorem 6.10 using formula (6.10) obtained in the previous section for 
replacing the matrix of a bilinear form by a change in basis. 


Theorem 6.11 For an arbitrary symmetric matrix ® , there exists a nonsingular ma- 
trix C such that the matrix C*®C is diagonal. If we select a different matrix C, we 
may obtain different diagonal matrices C* ®C, but the number of nonzero elements 
on the main diagonal will always be the same. 


A completely analogous argument can be applied to the case of antisymmetric 
bilinear forms. The following theorem is an analogue of Theorem 6.10. 


Theorem 6.12 For every antisymmetric bilinear form p(x, y), there exists a ba- 
SIS @1,...,@n Whose first 2r vectors can be combined into pairs (e2;—1, e2), i = 


202 6 Quadratic and Bilinear Forms 
1,...,r, such that 


p(erj-1, 21) = 1, p(e2;,e2i-1) =—1  foralli=1,...,r, 


g(e;,e;)=0 ifli—j|>lori>2r or j > 2r. 
Thus in the given basis, the matrix of the bilinear form @ takes the form 


Oe dix an be ds ee 


Proof This theorem is an exact parallel to Theorem 6.10. If g(x, y) = 0 for all x 
and y, then the assertion of the theorem is obvious (for r = 0). However, if this is not 
the case, then there exist two vectors e| and e2 for which gle}, e2) =a £0. Setting 
e= aes we obtain that y(e;, e2) = 1. The matrix of the form 9g restricted to the 
subspace L’ = (e1, €2) in the basis e;, e2 has the form 


0 1 
(S 0) (6.25) 


and consequently, it is nonsingular. Then on the basis of Theorem 6.9, we obtain the 
decomposition L = L’ @ (L’) oe where dim(L’)+ = n — 2, with n = dimL. Proceeding 
by induction, we may assume that the theorem has been proved for forms g defined 
on the space i(C)es If f;,.--, f,—2 is such a basis of the space (L’)+, the existence 
of which is asserted by Theorem 6.12, then it is obvious that e;, e2, fy,.-., fn—2 
is the required basis of the original space L. 


The number n — 2r is equal to the dimension of the radical of the bilinear form ¢, 
and therefore, it is the same for all bases in which the matrix of the bilinear form g 
is brought into the form (6.24). The rank of the matrix (6.25) is equal to 2, while the 
matrix (6.24) contains r such blocks on the main diagonal. Therefore, the rank of 
the matrix (6.24) is equal to 2r. Thus from Theorem 6.12, we obtain the following 
corollary. 


Corollary 6.13 The rank of an antisymmetric bilinear form is an even number. 


Let us now translate everything that we have proved for antisymmetric bilinear 
forms into the language of matrices. Here our assertions will be the same as for 


6.2 Reduction to Canonical Form 203 


symmetric matrices, and they are proved in exactly the same manner. We obtain that 
for an arbitrary antisymmetric matrix ®, there exists a nonsingular matrix C such 
that the matrix 


@'=C*SC (6.26) 


has the form (6.24). 

Matrices ® and ©’ that are related by (6.26) for some nonsingular matrix C are 
said to be equivalent. The same term is applied to the quadratic forms associated 
with these matrices (for a particular choice of basis). 

It is easy to verify that the concept thus introduced is an equivalence relation 
on the set of square matrices of a given order or indeed on the set of quadratic 
forms. The reflexive property is obvious. It is necessary only to substitute the matrix 
C = E into formula (6.26). Multiplying both sides of equality (6.26) on the right by 
the matrix B = C~! and on the left by the matrix B*, taking into account the rela- 
tionship (C~!)* = (C*)~!, we obtain the equality © = B*®’B, which establishes 
the symmetric property. 

Finally, let us verify the property of transitivity. Suppose we are given the re- 
lationships (6.26) and 6” = D*®'D for some nonsingular matrices C and D. 
Then if we substitute the first of these into the second, we obtain the equality 
&” = D*C*@CD. Setting B = CD and taking into account B* = D*C*, we ob- 
tain the equality ” = B*®B, which establishes the equivalence of the matrices ® 
and &”. 

It is now possible to reformulate Theorems 6.10 and 6.12 in the following form. 


Theorem 6.14 Every symmetric matrix is equivalent to a diagonal matrix. 


Theorem 6.15 Every antisymmetric matrix ® is equivalent to a matrix of the form 
(6.24), where the number r is equal to one-half the rank of the matrix ®. 


From Theorems 6.14 and 6.15, it follows that all equivalent symmetric matrices 
and all equivalent antisymmetric matrices have the same rank, and for antisymmetric 
matrices, equivalence is the same as the equality of their ranks, that is, two antisym- 
metric matrices of a given order are equivalent if and only if they have the same 
rank, 

Let us conclude with the observation that all the concepts investigated in this sec- 
tion can be expressed in the language of bilinear forms given by Theorem 6.3. By 
this theorem, every bilinear form g(x, y) ona vector space L can be written uniquely 
in the form g(x, y) = (x, A(y)), where A:L— L* is some linear transformation. 
As proved in Sect. 6.1, the symmetry of the form ¢g is equivalent to A* = A, while 
antisymmetry is equivalent to A* = —A. In the first case, the transformation A is 
said to be symmetric, and in the second case, antisymmetric. Thus Theorems 6.10 
and 6.12 are equivalent to the following assertions. For an arbitrary symmetric trans- 
formation A, there exists a basis of the vector space L in which the matrix of this 
transformation has the diagonal form (6.23). Similarly, for an arbitrary antisymmet- 
ric transformation A, there exists a basis of the space L in which the matrix of this 


204 6 Quadratic and Bilinear Forms 


transformation has the form (6.24). More precisely, in both these statements, we are 
talking about the choice of basis in L and its dual basis in L*, since the transforma- 
tion “~ maps L to L*. 


6.3 Complex, Real, and Hermitian Forms 


We begin this section by examining a quadratic form w in a complex vector space L. 
By Theorem 6.10, it can be written, in terms of some basis e;,..., @n, in the form 


w(x) = Aix? Se ey ee ce 


where x1,...,X, are the coordinates of the vector x in this basis. This implies that 
for the associated symmetric bilinear form g(x, y), it has the value y(e;,e;) = 0 
fori ~ j and ¢(e;, e;) = 4;. Here, the number of values A; different from zero is 
equal to the rank r of the bilinear form g. By changing the numeration of the basis 
vectors if necessary, we may assume that A; 4 0 fori <r and A; =0 fori > r. We 
may then introduce a new basis e',, ..., e/, by setting 


e;=V/Aie; fori <r, e,=e; fori>r, 


since ./A; is again a complex number. In the new basis, g(e;, e’.) =0 foralli¥~ j 
and y(e’,e) = 1 fori <r, g(e’,e.) =0 for i > r. This implies that the quadratic 
form w(x) can be written in this basis in the form 


p(x) = xP te + x2, (6.27) 


where x1,...,X, are the first r coordinates of the vector x. We see, then, that in 
a complex space L, every quadratic form can be brought into the canonical form 
(6.27), and all quadratic forms (and therefore also symmetric matrices) of a given 
rank are equivalent. 

We now consider the case of a real vector space L. By Theorem 6.10, an arbitrary 
quadratic form w can again be written in the form 


W(x) = Arp te + Apa, 


where all the 4; are nonzero and r is the rank of the form yw. But we cannot pro- 
ceed so simply as in the complex case by setting e; = J/hie;, since for A; < 0, the 
number A; does not have a real square root. Therefore, we must consider separately 
among the numbers Aj,...,A,, those that are positive and those that are negative. 
Again changing the numeration of the vectors of the basis as necessary, we may 
assume that A1,..., A, are positive, and that A,541,...,A, are negative. Now we can 
introduce a new basis by setting 


e,=VAi_ fori <s, e,=V/-A; fori=st+l,...,7, e,=e; fori>r. 


6.3 Complex, Real, and Hermitian Forms 205 


In this basis, for a bilinear form ¢g, we have g(e', e’.) = Ofori ¢ j,and g(e;,e;)=1 
fori=1,...,5, p(e,e) =—1 fori=s+1,...,r, and the quadratic form y will 
thus be brought into the form 


w(x) Sat pee te xy oe a x?, (6.28) 


Let us note one important special case. 


Definition 6.16 A real quadratic form w(x) is said to be positive definite if w(x) > 
0 for every x £ 0 and negative definite if w(x) < 0 for every x £0. 


It is obvious that these notions are connected by a simple relationship: negative 
definite forms w(x) are equivalent to positive definite forms — w(x), and conversely. 
Therefore, in the sequel, it will suffice to establish the basic properties of positive 
definite forms only, and the corresponding properties of negative definite forms will 
be obtained automatically. 

Written in the form (6.28), a quadratic form on an n-dimensional vector space 
will be positive definite if s =n, and negative definite if s =O andr =n. 

The fundamental property of real quadratic forms is stated in the following theo- 
rem. 


Theorem 6.17 For every basis in terms of which the real quadratic form can be 
written in the form (6.28), the number s always has one and the same value. 


Proof Let us characterize s in a way that does not depend on reducing the quadratic 
form y to the form (6.28). Namely, let us prove that s is equal to the largest di- 
mension among subspaces L’ C L such that the restriction of y to L’ is a positive 
definite quadratic form. To this end, we note first of all that for an arbitrary basis 
in which the form takes the form of (6.28), it is possible to find a subspace L’ of 
dimension s on which the restriction of the form y gives a positive definite form. 
Namely, if the form w(x) is written in the form (6.28) in the basis e1,..., @,, then 
we set L’ = (e1,..., es). It is obvious that the restriction of the form w to L’ gives a 
positive definite quadratic form. Similarly, we may consider the set of vectors L” for 
which in the decomposition (6.28), the first s coordinates are equal to zero: x; = 0, 
..., Xs =0. It is clear that this set is the vector subspace L” = (€544, €542,---,€n), 
and for an arbitrary vector x € L’, we have the inequality w(x) <0. 

Let us suppose that there exists a subspace M C L of dimension m > s such that 
the restriction of w to M gives a positive definite quadratic form. It is then obvious 
that dimM + dimL” =m +n —s >n. By Corollary 3.42, the subspaces M and 
L” must have a common vector x 4 0. But since x € L”, it follows that w(x) <0, 
and since x € M, we have (x) > 0. This contradiction completes the proof of the 
theorem. 


Definition 6.18 The number s from Theorem 6.17 that is the same no matter how 
a quadratic form is brought into the form (6.28) is called the index of inertia of the 
quadratic form wy. In connection with this, Theorem 6.17 is often called the law of 
inertia. 


206 6 Quadratic and Bilinear Forms 


Positive definite quadratic forms play an important role in the theory that we 
are expounding. By the theory developed thus far, to establish whether a quadratic 
form is positive definite, it is necessary to reduce it to canonical form and verify 
whether the relationship s = n is satisfied. However, there is a feature that makes it 
possible to determine positive definiteness from the matrix of the associated bilinear 
form written in an arbitrary basis. Suppose this matrix in the basis e1,..., @, has the 
form 


$= (g;j), where gj; = g(e;, e;). 


The minor A; of the matrix ® at the intersection of the first 7 rows and first i 
columns is called a leading principal minor. 


Theorem 6.19 (Sylvester’s criterion) A quadratic form w is positive definite if and 
only if all leading principal minors of the matrix of the associated bilinear form are 
positive. 


Proof We shall show that if a quadratic form is positive definite, then all the A; 
are positive. We note as well that A,, = |®| is the determinant of the matrix of the 
form ¢. In some basis, the form yw is in canonical form, that is, its matrix in this 
basis has the form 


4 OO «ss 0 
e O. de case 6 
O. 0. os. 2, 


Since the quadratic form w is positive definite, it follows that all the 4; are greater 
than 0, and clearly, |®’| > 0. In view of formula (6.26) for replacing the matrix of a 
bilinear form by a change of basis along with the equality |C*| = |C|, we obtain the 
relationship |®’| = |®| -|C|?, from which it follows that A, = |®| > 0. Let us now 
consider the subspaces L; = (e1,...,e;) C L of dimension i > 1. The restriction 
of the quadratic form y(x) to L; is clearly also a positive definite form. But the 
determinant of its matrix in the basis e;,...,e; is equal to A;. Therefore, A; > 0, 
as we have shown. 

Let us now show that conversely, from the condition A; > 0 for alli =1,...,n, 
it follows that the quadratic form yw is positive definite. We shall prove this by in- 
duction on the dimension n of the space L. 

It is clear that L; C L fori = 1,...,n — 1, and the leading principal minors A; 
in the basis e),..., @, of the matrix of the form w restricted to the subspace L; are 
the same as for the form g in L. Therefore, the restriction of the quadratic form w to 
L,—1 may be assumed positive definite by the induction hypothesis. Consequently, 
the restriction g(x, y) to the subspace L,_1 is a nonsingular bilinear form, and so by 
Theorem 6.9, we have the decomposition L = L,_1 ® (La-1p: where dimL,_; = 
n— 1 and dim(Ln—1)5 = 1. We may therefore express the vector e,, in the form 


en=f,t+y, whereyeli-1, fy, € (, (6.29) 


6.3 Complex, Real, and Hermitian Forms 207 


We may represent an arbitrary vector x € L as a linear combination of vectors of the 
basis e1,..., @,, that is, in the form x = xje, +---+X)-1€n-1 + Xnen =U Xnen, 
where u € L,_1. Substituting the expression (6.29) and setting u + x,y = v, we 
obtain 


x=v+xnf,, WwhereveL,-1, f, E (La-1)y- (6.30) 


This implies that the vectors v and f,, are orthogonal with respect to the bilinear 
form g, that is, y(v, f,,) = 0, and therefore, from the decomposition (6.30), we 
derive the equality 


W(x) = V0) +x, Vn): (6.31) 
We see, then, that in the basis e1,...,@n—1, f,,, the matrix of the bilinear form ¢ 
takes the form 
— 0 
| | 
—— 0) 
O a> 0 Uf.) 
and for its determinant D,, we obtain the expression D, = |®'| - W(f,,). Since 


Dy, > 0 and |®'| > 0, it then follows that w(f,,) > 0. By the induction hypothe- 
sis, the term y(v) is positive in formula (6.31), and therefore, w(x) > 0 for every 


x40. 


Example 6.20 Sylvester’s criterion has a beautiful application to the properties of 
algebraic equations. Consider a polynomial f(t) of degree n with real coefficients, 
about which we shall assume that its roots (real or complex) z1,..., Z, are distinct. 
For each root zz, we consider the linear form 


I(x) = x1 trate te tangy, (6.32) 


and likewise the quadratic form 


= YE Gaps): (6.33) 


k=1 


where x = (X1,...,Xn). 

Although among the roots zz there may be some that are complex, the quadratic 
form (6.33) is always real. This is obvious for the terms E corresponding to the 
real roots zy. Now, as regards the complex roots, it is well known that they come 
in complex conjugate pairs. Let z, and z; be complex conjugates of each other. 
Separating the coefficients /, of the linear form into real and imaginary parts, we 
can write it in the form J, = ug + ivg, where uz and vx are linear forms with real 
coefficients. Then /; = ux — iv,, and for this pair of complex conjugate roots, we 


have the sum L + iF a Que a 2ue, which is a real quadratic form. 


208 6 Quadratic and Bilinear Forms 


Thus the quadratic form (6.33) is real, and we have the following important cri- 
terion. 


Theorem 6.21 All the roots of a polynomial f (t) are real if and only if the quadratic 
form (6.33) is positive definite. 


Proof If all the roots zz are real, then all the linear forms J; of (6.32) are real, and 
the sum on the right-hand side of (6.33) contains only nonnegative terms. It is clear 
that it is equal to zero only if J, = 0 for all k= 1,...,n. This condition gives us 
a system consisting of n linear homogeneous equations in m unknowns x1,..., Xn. 
From formula (6.32), it is easy to see that the determinant of the matrix of this 
system is known to us already as a Vandermonde determinant; see formulas (2.32) 
and (2.33). It is different from zero, since all the roots zz are distinct, and hence this 
system has only the null solution. This implies that w(x) > 0 and w(x) = 0 if and 
only if x = 0, that is, the quadratic form (6.33) is positive definite. 

Let us now prove the converse assertion. Let the quadratic form (6.33) be positive 
definite, and suppose the polynomial f(t) has r real roots and p pairs of complex 
roots, so that r + 2p =n. Then as we have seen, 


P P 
wixe)= >) +2) (uj - 0%), (6.34) 
k=1 j=l 


where the first sum extends over all real roots, and the second sum is over all pairs 
of complex conjugate roots. 
Let us now show that if p > 0, then there exists a vector x 4 0 such that 


h(x)=0,  ..., F@)=0, wulx)=0,  .., — up(x) =0. 


These equalities represent a system of r + p linear homogeneous equations in n 
unknowns x1, ...,X,. Since the number of equations r + p is less thanr + 2p =n, 
it follows that this system has a nontrivial solution, x = (x1,...,X,), for which the 
quadratic form (6.34) takes the form 


p 
W(x) =-2) v7 <0, 


j=l 


and moreover, the equality w(x) = 0 is possible only if vj(x) = 0 for all j = 
1,..., p. But then we obtain the equalities /,(x) = 0 in general for all linear forms 
(6.32), which in view of the positive definiteness is possible only if x = 0. We have 
thus obtained a contradiction to the fact that p > 0, that is, that the polynomial f(t) 
has at least one complex root. 

The form (6.33) can be calculated explicitly, and then we can apply Sylvester’s 


criterion to it. To this end, we observe that the coefficient of the monomial x on 


the right-hand side of (6.33) is equal to 5241) =z; ? +--+ cn“, while the 


6.3 Complex, Real, and Hermitian Forms 209 


coefficient of the monomial x;x; (where i 4 j) is equal to 2s;4j-2 = Gg Ee 


vee pf itty, The sums s; = pean ra are called Newton sums. It is known from 
the theory of symmetric functions that they can be expressed as polynomials in the 
coefficients of f(t). Thus the matrix of a symmetric bilinear form associated with a 


quadratic form (6.33) has the form 


SO 5] Sn-1 
S] 52 Sn 
Sn-1 Sn +++ S2n-2 


Applying Sylvester’s criterion to the form (6.33), we obtain the following result: all 
(distinct) roots of the polynomial f(t) are real if and only if the following inequality 
holds for alli =1,...,n—1: 


SO S] Sj-1 
S] 52 Sj 

; > 0. 
Si-1 Si. +++) S2i-2 


To illustrate this assertion, let us consider the simplest case, n = 2. Let f(t) = 
t? + pt +q. Then for the roots of the polynomial f(t) to be real and distinct is 
equivalent to the following two inequalities: 


SO. SI 


opi >0. (6.35) 


so > 0, 


The first of these is satisfied for every polynomial, since so is simply its degree. If 
the roots of the polynomial f(t) are a and £, then 


so=2, sy=atP=—-p, »9=07+p?=(a+ B)* — 208 = p* —2q, 


and inequality (6.35) yields 2(p* — 2q) — p* = p* — 4q > 0. This is a criterion 
that one learns in secondary school: the roots of a quadratic trinomial are real and 
distinct if and only if the discriminant is positive. 

We return now to complex vector spaces and consider certain functions in them 
that are more natural analogues of bilinear and quadratic forms than those examined 
at the beginning of this section. 


Definition 6.22 A function f(x) defined on a complex vector space L and taking 
complex values is said to be semilinear if it possesses the following properties: 


fxty=fa+f), 
f(ax) =a f(x), 


(6.36) 


210 6 Quadratic and Bilinear Forms 


for arbitrary vectors x and y in the space L and complex scalar a (here and below, 
a denotes the complex conjugate of a). 


It is clear that for every choice of basis e1,...,@, of the space L, a semilinear 
function can be written in the form 


f(x) =X yp +--+ +Xnyn, 


where the vector x is equal to xje; +---+2X,@,, and the scalars y; are equal to 


f (ei). 


Definition 6.23 A function g(x, y) of two vectors in the complex vector space L is 
said to be sesquilinear if it is linear as a function of x for fixed y and semilinear as 
a function of y for fixed x. 


The terminology “sesquilinear” indicates the “full” linearity of the first argument 
and semilinearity of the second. Semilinear and sesquilinear functions are also fre- 
quently called forms. In the sequel, we shall also use such a designation. 

It is obvious that for an arbitrary choice of basis e1,...,@, of the space L, a 
sesquilinear form can be written in the form 


n 
v(x, y)= D> gijxi¥;, where gi; = 9(e,e)), (6.37) 
i,j=l 
and the vectors x and y are given by x = xje; + ---+x,e, and y= yyey +--+ + 
Yn€n. As in the case of a bilinear form, the matrix = (g;;) with elements gj; = 


y(e;,e;) as defined above is called the matrix of the sesquilinear form g(x, y) in 
the chosen basis. 


Definition 6.24 A sesquilinear form g(x, y) is said to be Hermitian if 


gly, x)= p(x, y) (6.38) 


for arbitrary choice of vectors x and y. 


It is obvious that in the expression (6.37), the Hermitian nature of the form 
g(x, y) is expressed by the property 9; = Qj; of the coefficients g;; of its ma- 
trix ®, that is, by the relationship = ®. A matrix exhibiting these properties is 
also called Hermitian. 

After separating real and imaginary parts in g(x, y), we obtain 


g(x, y) =u(x, y) +iv(x, y), (6.39) 


where u(x, y) and v(x, y) are functions of two vectors x and y of the complex 
space L taking real values. In the space L, multiplication by a real scalar is also 
defined, and so it may be viewed as a real vector space. We shall denote this real 


6.3 Complex, Real, and Hermitian Forms 211 
vector space by Lp. Obviously, in the space Lp, the functions u(x, y) and v(x, y) 
are bilinear, and the property of the complex form g(x, y) being Hermitian implies 


that on Lp, the bilinear form u(x, y) is symmetric, while v(x, y) is antisymmetric. 


Definition 6.25 A function w(x) on a complex vector space L is said to be 
quadratic Hermitian if it can be expressed in the form 


V(x) = g(x, x) (6.40) 
for some Hermitian form g(x, y). 


From the definition of Hermitian form, it follows at once that the values of 
quadratic Hermitian functions are real. 


Theorem 6.26 A quadratic Hermitian function yy(x) uniquely determines a Her- 
mitian sesquilinear form p(x, y) as presented in (6.40). 


Proof By the definition of sesquilinearity, we have 


W(x + y)= WX) + Wy) +9, y) +O, y). (6.41) 


Substituting here the expression (6.39), we obtain that 


u(x.) = 5 (WE +9) — HO) — VO). (6.42) 
Similarly, from the relationship 
Wa +rty)= Wo) + wy) + o(x, iy) + oy, x) (6.43) 
we obtain by the properties of being Hermitian and sesquilinearity that 
y(x,iy)=—ig(x,y), pliy,x) =, iy), 
which yields 


1 
v(x, y) = 5 (Wo +iy) — Wx) — W(iy)). (6.44) 


The expressions (6.42) and (6.44) thus obtained complete the proof of the theo- 
rem. 


Theorem 6.27 A sesquilinear form g(x, y) is Hermitian if and only if the function 
w(x) associated with it by relationship (6.40) assumes only real values. 


Proof Tf a sesquilinear form g(x, y) is Hermitian, then by definition (6.38), we 
have the equality g(x, x) = g(x, x) for all x €L, from which it follows that for an 
arbitrary vector x € L, the value w(x) is a real number. 


212 6 Quadratic and Bilinear Forms 


On the other hand, if the values of the function w(x) are real, then arguing just 
as we did in the proof of Theorem 6.26, we obtain from formula (6.41), taking into 
account (6.38), that the value 


w(x+y)—Woa)—v(y) =e, y)+ oly, x) 


is real. Substituting here the expression (6.39), we see that the sum u(x, y) + u(y, x) 
is equal to zero, that is, the function u(x, y) is antisymmetric. 
Reasoning similarly, from formula (6.43), we conclude that the value 


w(x +iy)-wWw)—-wliy) =e, iy) + oly, x) 


is also real. From the definition of semilinearity and sesquilinearity, we have the 
relationships g(iy, x) =ig(y, x) and g(x, iy) = —ig(x, y). We thereby obtain that 
the number 


i(y(y,x) — g(x, y)) 


is real, which by virtue of the expression (6.39) gives the equality u(y,x) — 
u(x, y) = 0; that is, the function u(x, y) is symmetric. Consequently, the form 
y(x, y) is Hermitian. 


Hermitian forms are the most natural complex analogues of symmetric forms. 
They exhibit analogous properties to those that we derived for symmetric forms in 
real vector spaces (with completely analogous proofs), namely reduction to canon- 
ical form, the law of inertia, the notion of positive definiteness, and Sylvester’s cri- 
terion. 


Chapter 7 
Euclidean Spaces 


The notions entering into the definition of a vector space do not provide a way of 
formulating multidimensional analogues of the length of a vector, the angle between 
vectors, and volumes. Yet such concepts appear in many branches of mathematics 
and physics, and we shall study such concepts in this chapter. All the vector spaces 
that we shall consider here will be real (with the exception of certain special cases in 
which complex vector spaces will be considered as a means of studying real spaces). 


7.1 The Definition of a Euclidean Space 


Definition 7.1 A Euclidean space is a real vector space on which is defined a fixed 
symmetric bilinear form whose associated quadratic form is positive definite. 


The vector space itself will be denoted as a rule by L, and the fixed symmetric 
bilinear form will be denoted by (x, y). Such an expression is also called the inner 
product of the vectors x and y. Let us now reformulate the definition of a Euclidean 
space using this terminology. 

A Euclidean space is a real vector space L in which to every pair of vectors x 
and y there corresponds a real number (x, y) such that the following conditions are 
satisfied: 


(1) (¥1 +2, y) = (%1, y) + (X2, y) for all vectors x1, x2, y EL. 
(2) (ax, y) =a(x, y) for all vectors x, y € L and real number a. 
(3) (x, y) = (y, x) for all vectors x, y EL. 

(4) (x,x) >Oforx 40. 


Properties (1)—(3) show that the function (x, y) is a symmetric bilinear form on 
L, and in particular, that (0, y) = 0 for every vector y € L. It is only property (4) that 
expresses the specific character of a Euclidean space. 

The expression (x, x) is frequently denoted by (x7); it is called the scalar square 
of the vector x. Thus property (4) implies that the quadratic form corresponding to 
the bilinear form (x, y) is positive definite. 


LR. Shafarevich, A.O. Remizov, Linear Algebra and Geometry, 213 
DOI 10.1007/978-3-642-30994-6_7, © Springer-Verlag Berlin Heidelberg 2013 


214 7 Euclidean Spaces 


Let us point out some obvious consequences of these definitions. For a fixed vec- 
tor y € L, where L is a Euclidean space, conditions (1) and (2) in the definition can 
be formulated in such a way that the function f(x) = (x, y) with argument x is 
linear. Thus we have a mapping y+> f’, of the vector space L to L*. Condition (4) 
in the definition of Euclidean space shows that the kernel of this mapping is equal 
to (0). Indeed, f, #0 for every y #0, since fy(y) = (y”) > 0. If the dimension 
of the space L is finite, then by Theorems 3.68 and 3.78, this mapping is an iso- 
morphism. Moreover, we should note that in contrast to the construction used for 
proving Theorem 3.78, we have now constructed an isomorphism L + L* without 
using the specific choice of a basis in L. Thus we have a certain natural isomor- 
phism L > L* defined only by the imposition of an inner product on L. In view of 
this, in the case of a finite-dimensional Euclidean space L, we shall in what follows 
sometimes identify L and L*. In other words, as for any bilinear form, for the in- 
ner product (x, y) there exists a unique linear transformation A: L— L* such that 
(x, y) = A(y)(x). The previous reasoning shows that in the case of a Euclidean 
space, the transformation A is an isomorphism, and in particular, the bilinear form 
(x, y) is nonsingular. Let us give some examples of Euclidean spaces. 


Example 7.2 The plane, in which for (x, y) is taken the well-known inner product 
of x and y as studied in analytic geometry, that is, the product of the vectors’ lengths 
and the cosine of the angle between them, is a Euclidean space. 


Example 7.3 The space R” consisting of rows (or columns) of length n, in which 
the inner product of rows x = (@1,...,@,) and y = (61,..., By) is defined by the 
relation 


(x, y) =a) By +0282 +--+» +OnBn, (7.1) 


is a Euclidean space. 


Example 7.4 The vector space L consisting of polynomials of degree at most n 
with real coefficients, defined on some interval [a, b], is a Euclidean space. For two 
polynomials f(t) and g(f), their inner product is defined by the relation 


b 
(f. 8) =) f(t)g(t) dt. (7.2) 


Example 7.5 The vector space L consisting of all real-valued continuous functions 
on the interval [a, b] is a Euclidean space. For two such functions f(t) and g(t), we 
shall define their inner product by equality (7.2). 


Example 7.5 shows that a Euclidean space, like a vector space, does not have to 
be finite-dimensional.! In the sequel, we shall be concerned exclusively with finite- 
dimensional Euclidean spaces, on which the inner product is sometimes called the 


'Infinite-dimensional Euclidean spaces are usually called pre-Hilbert spaces. An especially impor- 
tant role in a number of branches of mathematics and physics is played by the so-called Hilbert 


7.1 The Definition of a Euclidean Space 215 


Fig. 7.1 Orthogonal 
projection 


ae 


scalar product (because the inner product of two vectors is a scalar) or dot product 
(because the notation x - y is frequently used instead of (x, y)). 


Example 7.6 Every subspace L’ of a Euclidean space L is itself a Euclidean space if 
we define on it the form (x, y) exactly as on the space L. 


In analogy with Example 7.2, we make the following definition. 


Definition 7.7 The /ength of a vector x in a Euclidean space is the nonnegative 
value \/ (x2). The length of a vector x is denoted by |x|. 


We note that we have here made essential use of property (4), by which the length 
of a nonnull vector is a positive number. 

Following the same analogy, it is natural to define the angle y between two vec- 
tors x and y by the condition 


ee. O0<g<z. (7.3) 
Ix|- yl 


However, such a number ¢ exists only if the expression on the right-hand side of 
equality (7.3) does not exceed | in absolute value. Such is indeed the case, and the 
proof of this fact will be our immediate objective. 


Lemma 7.8 Given a vector e £0, every vector x € L can be expressed in the form 
x=ae+y, (e,y)=0, (7.4) 

for some scalar a and vector y € L; see Fig. 7.1. 

Proof Setting y = x — ae, we obtain a from the condition (e, y) = 0. This is equiv- 


alent to (x, e) = a(e, e), which implies that a = (x, e)/lel?. We remark that |e| 4 0, 
since by assumption, e 4 0. 


spaces, which are pre-Hilbert spaces that have the additional property of completeness, just for 
the case of infinite dimension. (Sometimes, in the definition of pre-Hilbert space, the condition 
(x, x) > 0 is replaced by the weaker condition (x, x) > 0.) 


216 7 Euclidean Spaces 


Definition 7.9 The vector we from relation (7.4) is called the orthogonal projection 
of the vector x onto the line (e). 


Theorem 7.10 The length of the orthogonal projection of a vector x is at most its 
length |x|. 


Proof Indeed, since by definition, x = we + y and (e, y) = 0, it follows that 
Ix? = (x?) =(@e+y,ae+ y) = |ael +|yl? = |ael’, 


and this implies that 
|x| > |ee|. (7.5) 


This leads directly to the following necessary theorem. 


Theorem 7.11 For arbitrary vectors x and y in a Euclidean space, the following 
inequality holds: 


\(x, y)| < |x] -lyl. (7.6) 


Proof Tf one of the vectors x, y is equal to zero, then the inequality (7.6) is obvious, 
and is reduced to the equality 0 = 0. Now suppose that neither vector is the null 
vector. In this case, let us denote by ay the orthogonal projection of the vector 
x onto the line (y). Then by (7.4), we have the relationship x = ay + z, where 
(y, Z) = 0. From this we obtain the equality 


(x, y)=(ay +z, y) = (ay, y) =alyl’. 


This means that |(x, y)| = |a|- |y|* = |ay|-|y|. But by Theorem 7.10, we have 
the inequality |ay| < |x|, and consequently, |(x, y)| < |x|- |y|. 


Inequality (7.6) goes by a number of names, but it is generally known as the 
Cauchy—Schwarz inequality. From it we can derive the well-known triangle inequal- 
ity from elementary geometry. Indeed, suppose that the vectors x = AB, y= BC, 
Z= CA correspond to the sides of a triangle ABC. Then we have the relationship 
x + y+z=0, from which with the help of (7.6) we obtain the inequality 


IP =(@+y.x+y)=le? +2, y) +19? Sle? +2], »)| +19? 
2 
< |x)? +2Ix|-Lyl+1y? = (lel +1 yl)’, 
from which clearly follows the triangle inequality |z| < |x| + |y]. 
Thus from Theorem 7.11 it follows that there exists a number ¢ that satisfies the 
equality (7.3). This number is what is called the angle between the vectors x and y. 
Condition (7.3) determines the angle uniquely if we assume thatO< go <7z. 


7.1 The Definition of a Euclidean Space 217 


Definition 7.12 Two vectors x and y are said to be orthogonal if their inner product 
is equal to zero: (x, y) = 0. 


Let us note that this repeats the definition given in Sect. 6.2 for a bilinear form 
g(x, y) = (x, y). By the definition given above in (7.3), the angle between orthog- 
onal vectors is equal to 5. 

For a Euclidean space, there is a useful criterion for the linear independence of 
vectors. Let a1, ..., @» be m vectors in the Euclidean space L. 


Definition 7.13 The Gram determinant, or Gramian, of a system of vectors 
a,...,Qm is the determinant 


(a1,a1) (a@j,a2) +++ (a1,am) 
(a2,a,) (2,42) +++ (d2,am) 
G(aj,...,4m) = : : . : . (7.7) 
(Qn, a\) (An, a2) RES (Qn, Qn) 
Theorem 7.14 /f the vectors a,,...,@m are linearly dependent, then the Gram de- 
terminant G(a\,...,@m) is equal to zero, while if they are linearly independent, 
then G(aj,...,Am) > 0. 
Proof If the vectors a,,...,@m are linearly dependent, then as was shown in 


Sect. 3.2, one of the vectors can be expressed as a linear combination of the oth- 
ers. Let it be the vector aj, that is, ay, = a 1a, + --- + @ n—1@m_—1. Then from the 
properties of the inner product, it follows that for every i = 1,...,m, we have the 
equality 


(Am, @j) = | (41, Aj) + 2(A2, Aj) + +++ + Qm—1(Am-1, Gi). 


From this it is clear that if we subtract from the last row of the determinant (7.7), all 
the previous rows multiplied by coefficients a1, ...,@m—1, then we obtain a deter- 
minant with a row consisting entirely of zeros. Therefore, G(aj,..., @m) =0. 

Now suppose that vectors a1, ..., @m are linearly independent. Let us consider in 
the subspace L’ = (aj,...,@m), the quadratic form (x7). Setting x =aja,;+---+ 
AmAm, We May write it in the form 


m 
((aja1 sport + Om4m)”) _ > at at; (aj, aj). 
igs 


It is easily seen that this quadratic form is positive definite, and its determinant coin- 
cides with the Gram determinant G(a,,...,@). By Theorem 6.19, it now follows 
that G(a,...,@m) > 0. 


Theorem 7.14 is a broad generalization of the Cauchy—Schwarz inequality. In- 
deed, for m = 2, inequality (7.6) is obvious (it becomes an equality) if vectors x 


218 7 Euclidean Spaces 


and y are linearly dependent. However, if x and y are linearly independent, then 
their Gram determinant is equal to 


(x,x) (x,y) 


COM=le yy Gy] 


The inequality G(x, y) > 0 established in Theorem 7.14 gives us (7.6). In partic- 
ular, we see that inequality (7.6) becomes an equality only if the vectors x and y 
are proportional. We remark that this is easy to derive if we examine the proof of 
Theorem 7.11. 


Definition 7.15 Vectors e;,...,@m in a Euclidean space form an orthonormal sys- 
tem if 


(e;,e;)=0 fori¥j, (e;,e;) = 1, (7.8) 


that is, if these vectors are mutually orthogonal and the length of each of them is 
equal to 1. If m =n and the vectors e1,..., @, form a basis of the space, then such 
a basis is called an orthonormal basis. 


It is obvious that the Gram determinant of an orthonormal basis is equal to 1. 

We shall now use the fact that a quadratic form (x) is positive definite and 
apply to it formula (6.28), in which by the definition of positive definiteness, s =n. 
This result can now be reformulated as an assertion about the existence of a basis 
€1,..-, np Of the space L in which the scalar square of a vector ¥ = ae] +---+Qnen 
is equal to the sum of the squares of its coordinates, that is, (x7) = ar teeet a2, 
In other words, we have the following result. 


Theorem 7.16 Every Euclidean space has an orthonormal basis. 


Remark 7.17 In an orthonormal basis, the inner product of x = (q@1,...,@,) and 
y = (f1,..., Bn) has a particularly simple form, given by formula (7.1). Accord- 
ingly, in an orthonormal basis, the scalar square of an arbitrary vector is equal to the 
sum of the squares of its coordinates, while its length is equal to the square root of 
the sum of the squares. 


The lemma establishing the decomposition (7.4) has an important and far- 
reaching generalization. To formulate it, we recall that in Sect. 3.7, for every sub- 
space L’ C L we defined its annihilator (L’)“ Cc L*, while earlier in this section, we 
showed that an arbitrary Euclidean space L of finite dimension can be identified 
with its dual space L*. As a result, we can view (L’)“ as a subspace of the original 
space L. In this light, we shall call it the orthogonal complement of the subspace 
L’ and denote it by (L’ )+. If we recall the relevant definitions, we obtain that the 
orthogonal complement (L’)+ of the subspace L’ C L consists of all vectors y € L 
for which the following condition holds: 


(x,y)=0 forallxeL. (7.9) 


7.1 The Definition of a Euclidean Space 219 


On the other hand, (L’)+ is the subspace (L’ )- | defined for the case that the bilinear 
form g(x, y) is given by g(x, y) = (x, y); see p. 198. 

A basic property of the orthogonal complement in a finite-dimensional Euclidean 
space is contained in the following theorem. 


Theorem 7.18 For an arbitrary subspace Ll, of a finite-dimensional Euclidean 
space L, the following holds: 


L=l@Lt. (7.10) 
In the case Lj = (e), Theorem 7.18 follows from Lemma 7.8. 


Proof of Theorem 7.18 In the previous chapter, we saw that every quadratic form 
w(x) in some basis of a vector space L can be reduced to the canonical form (6.22), 
and in the case of a real vector space, to the form (6.28) for some scalars 0 < 5 <r, 
where s is the index of inertia and r is the rank of the quadratic form w(x), or 
equivalently, the rank of the symmetric bilinear form g(x, y) associated with w(x) 
by the relationship (6.11). We recall that a bilinear form g(x, y) is nonsingular if 
r =n, where n = dimL. 

The condition of positive definiteness for the form W(x) is equivalent to the 
condition that all scalars 4;,...,4, in (6.22) be positive, or equivalently, that the 
equality s =r =n hold in formula (6.28). From this it follows that a symmetric 
bilinear form g(x, y) associated with a positive definite quadratic form w(x) is 
nonsingular on the space L as well as on every subspace L’ C L. To complete the 
proof, it suffices to recall that by definition, the quadratic form (x*) associated with 
the inner product (x, y) is positive definite and to use Theorem 6.9 for the bilinear 
form g(x, y)= (x,y). 


From relationship (3.54) for the annihilator (see Sect. 3.7) or from Theorem 7.18, 
it follows that 


dim(L;)+ = dimL — dimL). 


The map that is the projection of the space L onto the subspace L; parallel to ee 
(see the definition on p. 103) is called the orthogonal projection of L onto L;. Then 
the projection of the vector x € L onto the subspace L is called its orthogonal 
projection onto L. This is a natural generalization of the notion introduced above 
of orthogonal projection of a vector onto a line. Similarly, for an arbitrary subset 
X CL, we can define its orthogonal projection onto L;. 

The Gram determinant is connected to the notion of volume in a Euclidean space, 
generalizing the notion of the length of a vector. 


Definition 7.19 The parallelepiped spanned by vectors a1, ..., Am is the collection 
of all vectors aa, +---+Qm@m for all 0 < a; < 1. Itis denoted by I7(a),..., am). 
A base of the parallelepiped [T(a1,...,@m) is a parallelepiped spanned by any 
m — 1 vectors among a@,...,@m, for example, [7(a1,...,@m—1). 


220 7 Euclidean Spaces 


Fig. 7.2 Altitude of a 
parallelepiped 


In the case of the plane (see Example 7.2), we have parallelepipeds [T(a,) and 
IT(a,, a2). By definition, [7(a,) is the segment whose beginning and end coincide 
with the beginning and end of the vector a;, while [7(a1, a2) is the parallelogram 
constructed from the vectors a; and ap. 

We return now to the consideration of an arbitrary parallelepiped 


IT(a1,...,4m), 


and we define the subspace L; = (a1,...,@m— 1). To this case we may apply the 
notion introduced above of orthogonal projection of the space L. By the decompo- 
sition (7.10), the vector a, can be uniquely represented in the form ay, =x + y, 
where x € Lj and ye Ls The vector y is called the altitude of the parallelepiped 
IT(a\,...,@m) dropped to the base IT(a1,...,@m—1). The construction we have 
described is depicted in Fig. 7.2 for the case of the plane. 

Now we can introduce the concept of volume of a parallelepiped 


IT(a\,...,4m), 


or more precisely, its unoriented volume. This is by definition a nonnegative number, 
denoted by V(aj,...,@m) and defined by induction on m. In the case m = 1, it is 
equal to V(a,) = |a,|, and in the general case, V(a1,...,@ ) is the product of 
V(a1,...,@m-_—1) and the length of the altitude of the parallelepiped /7T(a1,..., am) 
dropped to the base [T(a1,..., @m—1). 

The following is a numerical expression for the unoriented volume: 


V7 (a1, ..-,@m) = G(a},..., Am). (7.11) 
This relationship shows the geometric meaning of the Gram determinant. 


Formula (7.11) is obvious for m = 1, and in the general case, it is proved by 
induction on m. According to (7.10), we may represent the vector a,, in the form 


Am =x + y, where x € Ly = (a1,...,@m—1) and ye oe Then ay, = aja, +---+ 
Qm—1Am—1 + y. We note that y is the altitude of our parallelepiped dropped to the 
base IT(a1,...,@m_—1). Let us recall formula (7.7) for the Gram determinant and 


subtract from its last column, each of the other columns multiplied by a1, ..., @m—1. 


7.1 The Definition of a Euclidean Space 221 


As aresult, we obtain 


(a|, a1) (a\,a2) ++: 0 
(a2, a) (a2,a2) ++ 0 
6 Gis by: e (7.12) 
(Qm—1,41) (Q@m—1,42)  -°: 0 
(am, 4@1) (Qm,42) +++ (Y,@m) 


and moreover, (y, am) = (y, y) = ly, since y € ie 
Expanding the determinant (7.12) along its last column, we obtain the equality 


G(a,...,4m) = G(aq,...,4m—1)|y|*- 


Let us recall that by construction, y is the altitude of the parallelepiped IT(a,,..., 
Am) dropped to the base IT(a,,...,@m—1). By the induction hypothesis, we have 
G(a,,...,4m—1) = V7(aj,..., @m—1), and this implies 


G(a1,..-,@m) = V7 (a1, ..-,@m—1) Yl? = V7(a1,.-- mn). 


Thus the concept of unoriented volume that we have introduced differs from the 
volume and area about which we spoke in Sects. 2.1 and 2.6, since the unoriented 
volume cannot assume negative values. This explains the term “unoriented.” We 
shall now formulate a second way of looking at the volume of a parallelepiped, 
one that generalizes the notions of volume and area about which we spoke earlier 
and differs from unoriented volume by the sign +1. By Theorem 7.14, of interest 


is only the case in which the vectors a1, ...,@m are linearly independent. Then we 
may consider the space L = (a1, ..., @m) with basis aj,...,@m. 
Thus we are given n vectors a1, ...,@,, where n = dimL. We consider the matrix 


A, whose jth column consists of the coordinates of the vector a; relative to some 
orthonormal basis e1,..., €n: 


aii a2 Gin 

a21 a22 42n 
A= 

Gni Gn2 *** Ann 


An easy verification shows that in the matrix A*A, the intersection of the ith row 
and jth column contains the element (a;,a;). This implies that the determinant of 
the matrix A*A is equal to G(aj,...,a,), and in view of the equalities |A* A] = 
|A*|-|A| = |A|?, we obtain |A|? = G(ay,..., an). On the other hand, from formula 
(7.11), it follows that G(a,,...,@,) = V7(aq,..., ay), and this implies that 


|A|=+V(qj,...,ay). 


The determinant of the matrix A is called the oriented volume of the n-dimensional 
parallelepiped [7(a1,...,a,). Itis denoted by v(a1,..., a,). Thus the oriented and 


222 7 Euclidean Spaces 


unoriented volumes are related by the equality 
Vidi, icy @n) = Ui. s.5@a) | 


Since the determinant of a matrix does not change under the transpose operation, 
it follows that v(a,,...,a@,) = |A*|. In other words, for computing the oriented 
volume, one may write the coordinates of the generators of the parallelepiped a; not 
in the columns of the matrix, but in the rows, which is sometimes more convenient. 

It is obvious that the sign of the oriented volume depends on the choice of or- 
thonormal basis e;,..., @,. This dependence is suggested by the term “oriented.” 
We shall have more to say about this in Sect. 7.3. 

The volume possesses some important properties. 


Theorem 7.20 Let C :L— L be a linear transformation of the Euclidean space L 
of dimension n. Then for any n vectors a,,...,@y in this space, one has the rela- 
tionship 


v(C(a1),...,C(an)) =|Clu(ay,...,@n)- (7.13) 


Proof We shall choose an orthonormal basis of the space L. Suppose that the trans- 
formation @ has matrix C in this basis and that the coordinates a1,...,a@, of an 
arbitrary vector a are related to the coordinates 6,,..., 6, of its image C(a) by 
the relationship (3.25), or in matrix notation, (3.27). Let A be the matrix whose 
columns consist of the coordinates of the vectors a,,...,@,, and let A’ be the ma- 
trix whose columns consist of the coordinates of the vectors C (a1), ..., C (a). Then 
it is obvious that we have the relationship A’ = CA, from which it follows that 
|A’| =|C|-|Al. 

To complete the proof, it remains to note that |C| = |C|, and by the def- 
inition of oriented volume, we have the equalities v(a),...,a,) = |A| and 
v(C(a}),---,C(@n)) = |A'. 


It follows from this theorem, of course, that 
V(C(a1),...,C(an)) = ||AI|V(ai,..., an), (7.14) 


where ||A|| denotes the absolute value of the determinant of the matrix A. 

Using the concepts introduced thus far, we may define an analogue of the volume 
V(M) for a very broad class of sets M containing all the sets actually encountered 
in mathematics and physics. This is the subject of what is called measure theory, but 
since it is a topic that is rather far removed from linear algebra, it will not concern 
us here. Let us note only that the important relationship (7.14) remains valid here: 


V(C(M)) = ||Al|V(). (7.15) 


An interesting example of a set in an n-dimensional Euclidean space is the ball B(r) 
of radius r, namely the set of all vectors x € L such that |x| <r. The set of vectors 
x €L for which |x| =r is called the sphere S(r) of radius r. From the relationship 
(7.15) it follows that V(B(r)) = Vr”, where V, = V(B(1)). The calculation of the 


7.2 Orthogonal Transformations 223 


interesting geometric constant V, is a question from analysis, related to the theory 
of the gamma function I’. Here we shall simply quote the result: 


gt/2 


PRE 


Vn 


It follows from the theory of the gamma function that if n is an even number 
(n = 2m), then V, =” /m!, and if n is odd (n = 2m + 1), then V, =2”t!n™/(1- 
3---(QQm+1)). 


7.2 Orthogonal Transformations 


Let L; and L2 be Euclidean spaces of the same dimension with inner products 
(x, y)1 and (x, y)2 defined on them. We shall denote the length of a vector x in 
the spaces L; and Lz by |x|; and |x|2, respectively. 


Definition 7.21 An isomorphism of Euclidean spaces L; and Lz is an isomorphism 
“A :L, — Ly of the underlying vector spaces that preserves the inner product, that 
is, for arbitrary vectors x, y € Lj, the following relationship holds: 


(x, y)1 = (A(X), AVY). (7.16) 


If we substitute the vector y = x into equality (7.16), we obtain that |x|? = 
|-A(x)I5, and this implies that |x|; = |.A(x)|2, that is, the isomorphism “A preserves 
the lengths of vectors. 

Conversely, if “A: Ly; — L2 is an isomorphism of vector spaces that preserves the 
lengths of vectors, then | A(x + y)I5 = |x + y|7, and therefore, 


| AC); + 2(A@), ACY)), + [403 = lel? + 20, Wi + Ll? 


But by assumption, we also have the equalities | A(x)|2 = |x|; and |A(y)|2 = |yl1, 
which implies that (x, y)1 = (A(x), A(y))2. This, strictly speaking, is a conse- 
quence of the fact (Theorem 6.6) that a symmetric bilinear form (x, y) is determined 
by the quadratic form (x, x), and here we have simply repeated the proof given in 
Sect. 4.1. 

If the spaces L; and Lz have the same dimension, then from the fact that the linear 
transformation A: L; — L» preserves the lengths of vectors, it already follows that 
it is an isomorphism. Indeed, as we saw in Sect. 3.5, it suffices to verify that the 
kernel of the transformation A is equal to (0). But if A(x) = 0, then | A(x)|2 = 0, 
which implies that |x|; = 0, that is, x = 0. 


Theorem 7.22 All Euclidean spaces of a given finite dimension are isomorphic to 
each other. 


224 7 Euclidean Spaces 


Proof From the existence of an orthonormal basis, it follows at once that every n- 
dimensional Euclidean space is isomorphic to the Euclidean space in Example 7.3. 
Indeed, let e1,..., @€, be an orthonormal basis of a Euclidean space L. Assigning to 
each vector x € L the row of its coordinates in the basis e),...,e,, we obtain an 
isomorphism of the space L and the space R” of rows of length n with inner product 
(7.1) (see the remarks on p. 218). It is easily seen that isomorphism is an equivalence 
relation (p. xii) on the set of Euclidean spaces, and by transitivity, it follows that all 
Euclidean spaces of dimension 7 are isomorphic to each other. 


Theorem 7.22 is analogous to Theorem 3.64 for vector spaces, and its general 
meaning is the same (this is elucidated in detail in Sect. 3.5). For example, using 
Theorem 7.22, we could have proved the inequality (7.6) differently from how it 
was done in the preceding section. Indeed, it is completely obvious (the inequality 
is reduced to an equality) if the vectors x and y are linearly dependent. If, on the 
other hand, they are linearly independent, then we can consider the subspace L’ = 
(x, y). By Theorem 7.22, it is isomorphic to the plane (Example 7.2 in the previous 
section), where this inequality is well known. Therefore, it must also be correct for 
arbitrary vectors x and y. 


Definition 7.23 A linear transformation U of a Euclidean space L into itself that 
preserves the inner product, that is, satisfies the condition that for all vectors x and 


Jy, 
(x, y) = (U(x), U(y)), (7.17) 


is said to be orthogonal. 


This is clearly a special case of an isomorphism of Euclidean spaces L; and Lz 
that coincide. 

It is also easily seen that an orthogonal transformation U takes an orthonormal 
basis to another orthonormal basis, since from the conditions (7.8) and (7.17), it 
follows that U(e1),..., U(@n) is an orthonormal basis if e1,..., @n is. Conversely, 
if a linear transformation U takes some orthonormal basis e1,...,e, to another 
orthonormal basis, then for vectors x = aje] +---+Q,e, and y= Bye; +--+ + 
Bnen, we have 


U(x) =a; Ul(el) +++» Fan UEn), U(y) = Bi U(e1) +--+ + Bn Ulen). 


Since both e;,...,@, and U(e;),..., U(e,) are orthonormal bases, it follows by 
(7.1) that both the left- and right-hand sides of relationship (7.17) are equal to the 
expression a1 {1 +---+dy Bn, that is, relationship (7.17) is satisfied, and this implies 
that U is an orthogonal transformation. 

We note the following important reformulation of this fact: for any two orthonor- 
mal bases of a Euclidean space, there exists a unique orthogonal transformation that 
takes the first basis into the second. 

Let U = (u;;) be the matrix of a linear transformation U in some orthonormal 
basis @1,..., @n. It follows from what has gone before that the transformation U is 


7.2 Orthogonal Transformations 225 


orthogonal if and only if the vectors U(e1),..., U(e,) form an orthonormal basis. 
But by the definition of the matrix U, the vector U(e;) is equal to Vie 1 UKiek, and 
since €],..., @, 18 an orthonormal basis, we have 


(U(ej), Ulej)) = uur; + uur; ++++ + UniUnj. 


The expression on the right-hand side is equal to the element c;;, where the ma- 
trix (cjj) is equal to U*U. This implies that the condition of orthogonality of the 
transformation U can be written in the form 


U*U=E, (7.18) 
or equivalently, U* = U~!. This equality is equivalent to 
UU* =E, (7.19) 
and can be expressed as relationships among the elements of the matrix U: 
ujuji +--+ uintjn =O fori xj, 9 ut, +---tu2,=1. (7.20) 


The matrix U satisfying the relationship (7.18) or the equivalent relationship (7.19) 
is said to be orthogonal. 

The concept of an orthonormal basis of a Euclidean space can be interpreted 
more graphically using the notion of flag (see the definition on p. 101). Namely, we 


associate with an orthonormal basis e;,..., e, the flag 
(O)cL;CWuc:--CcCL,=L, (7.21) 
in which the subspace L; is equal to (e;,...,e;), and the pair (L;_;, L;) is directed 


in the sense that L* is the half-space of L; containing the vector e;. In the case of a 
Euclidean space, the essential fact is that we obtain a bijection between orthonormal 
bases and flags. 

For the proof of this, we have only to verify that the orthonormal basis e1, ..., en 
is uniquely determined by its associated flag. Let this basis be associated with 
the flag (7.21). If we have already constructed an orthonormal system of vectors 
€1,..-,@;-1 such that L;_; = (e;,...,e;-1), then we should consider the orthogo- 
nal complement lis , of the subspace L;_; in L;. Then dim ey = 1 and eer = (e;), 
where the vector e; is uniquely defined up to the factor +1. This factor can be se- 
lected unambiguously based on the condition e; € ce 

An observation made earlier can now be interpreted as follows: For any two flags 
@, and ®2 of a Euclidean space L, there exists a unique orthogonal transformation 
that maps ®; to ®. 

Our next goal will be the construction of an orthonormal basis in which a given 
orthogonal transformation U has the simplest matrix possible. By Theorem 4.22, 
the transformation U has a one- or two-dimensional invariant subspace L’. It is clear 
that the restriction of U to the subspace L’ is again an orthogonal transformation. 


226 7 Euclidean Spaces 


Let us determine first the sort of transformation that this can be, that is, what sorts 
of orthogonal transformations of one- and two-dimensional spaces exist. 

If dimL’ = 1, then L’ = (e) for some nonnull vector e. Then U(e) = we, where 
a is some scalar. From the orthogonality of the transformation U, we obtain that 


(e,e) = (ae, ae) = a’(e, e), 


from which it follows that a” = 1, and this implies that « = +1. Consequently, in 
a one-dimensional space L’, there exist two orthogonal transformations: the identity 
&, for which & (x) = x for all vectors x, and the transformation U such that U(x) = 
—x. It is obvious that U= —&. 

Now let dim L’ = 2, in which case L’ is isomorphic to the plane with inner product 
(7.1). It is well known from analytic geometry that an orthogonal transformation of 
the plane is either a rotation through some angle ¢g about the origin or a reflection 
with respect to some line /. In the first case, the orthogonal transformation U in an 
arbitrary orthonormal basis of the plane has matrix 


; g -—sin °) (7.22) 


sing cos 


In the second case, the plane can be represented in the form of the direct sum L’ = 
1 @ 1+, where / and /+ are lines, and for a vector x we have the decomposition 
x = y-+z, where y €/ and z €/+, while the vector U(x) is equal to y — z. If we 
choose an orthonormal basis e;, e2 in such a way that the vector e, lies on the line 
1, then the transformation U will have matrix 


1 0 
2(, Ea (7.23) 


But we shall not presuppose this fact from analytic geometry, and instead show 
that it derives from simple considerations in linear algebra. Let U have, in some 
orthonormal basis e1, é2, the matrix 


a b 
(: A (7.24) 


that is, it maps the vector xe; + ye2 to (ax + by)e; + (cx + dy)e2. The fact that U 
preserves the length of a vector gives the relationship 


(ax + by)’ + (cx +dyP =x? +9? 
for all x and y. Substituting in turn (1, 0), (0, 1), and (1, 1) for (x, y), we obtain 
at+c=l, b+d=1, ab+cd=0. (7.25) 


From the relationship (7.19), it follows that |UU*| = 1, and since |U*| = |U, it fol- 
lows that |U|* = 1, and this implies that |U| = +1. We need to consider separately 
the cases of different signs. 


7.2 Orthogonal Transformations 227 


If |U| = —1, then the characteristic polynomial |U — tE| of the matrix (7.24) is 
equal to t? —(a+d)t — 1 and has positive discriminant. Therefore, the matrix (7.24) 
has two real eigenvalues A; and A» of opposite signs (since by Viéte’s theorem, 
A ,A2 = —1) and two associated eigenvectors e; and e2. Examining the restriction 
of U to the one-dimensional invariant subspaces (e1) and (e2), we arrive at the 
one-dimensional case considered above, from which, in particular, it follows that 
the values A; and A2 are equal to +1. Let us show that the vectors e; and e2 are 
orthogonal. By the definition of eigenvectors, we have the equalities U(e;) = A;e;, 
from which we have 


(U(e1), U(er)) = (Are, Aver) = A1A2(E1, €2). (7.26) 


But since the transformation U is orthogonal, it follows that (U(e1), U(e2)) = 
(€;, 2), and from (7.26), we obtain the equality (e1, e2) = A1A2(e1, e2). Since A, 
and 2 have opposite signs, it follows that (e;, e2) = 0. Choosing eigenvectors e; 
and e2 of unit length and such that 4; = 1 and A2 = —1, we obtain the orthonormal 
basis €;, @2 in which the transformation U has matrix (7.23). We then have the de- 
composition L=/ @1/ +, where / = (e;) and /+ = (e), and the transformation U is 
a reflection in the line /. 

But if |U| = 1, then by relationship (7.25) for a, b, c, d, it is easy to derive, keep- 
ing in mind that ad — bc = 1, that there exists an angle g such that a=d =cos@ 
and c= —b= sing, that is, the matrix (7.24) has the form (7.22). 

As a basis for examining the general case, we have the following theorem. 


Theorem 7.24 If a subspace L’ is invariant with respect to an orthogonal trans- 
formation U, then its orthogonal complement (L')+ is also invariant with respect 


to U. 


Proof We must show that for every vector y € (L’)+, we have U(y) € (Lj. Te 
ye )+, then (x, y) =0 for all x € L’. From the orthogonality of the transforma- 
tion U, we obtain that (U(x), U(y)) = (x, y) = 0. Since U is a bijective mapping 
from L to L, its restriction to the invariant subspace L’ is a bijection from L’ to L’. In 
other words, every vector x’ € L’ can be represented in the form x’ = U(x), where 
x is some other vector in L’. Consequently, (x’, U(y)) = 0 for every vector x’ € L’, 
and this implies that U(y) € (L’)+. 


Remark 7.25 In the proof of Theorem 7.24, we nowhere used the positive definite- 
ness of the quadratic form (x, x) associated with the inner product (x, y). Indeed, 
this theorem holds as well for an arbitrary nonsingular bilinear form (x, y). The 
condition of nonsingularity is required in order that the restriction of the transfor- 
mation U to an invariant subspace be a bijection, without which the theorem would 
not be true. 


Definition 7.26 Subspaces L; and Lz of a Euclidean space are said to be mutually 
orthogonal if (x, y) = 0 for all vectors x € Lj and y € Ly. In such a case, we write 


228 7 Euclidean Spaces 


L,; 1 Lo. The decomposition of a Euclidean space as a direct sum of orthogonal 
subspaces is called an orthogonal decomposition. 


If dimL > 2, then by Theorem 4.22, the transformation U has a one- or two- 
dimensional invariant subspace. Thus using Theorem 7.24 as many times as neces- 
sary (depending on dimL), we obtain the orthogonal decomposition 


L=L;@lo@---@GlLx, whereL; LL; for alli ¥ j, (7.27) 


with all subspaces L; invariant with respect to the transformation U and of dimen- 
sion | or 2. 

Combining the orthonormal bases of the subspaces L;,...,L; and choosing a 
convenient ordering, we obtain the following result. 


Theorem 7.27 For every orthogonal transformation there exists an orthonormal 
basis in which the matrix of the transformation has the block-diagonal form 


1 
0 
1 
=i 
(7.28) 
= 
Ag, 
0 
Ag, 
where 
Ag, = fe ana. ) (7.29) 
SING; COSQ; 
gi A~Amk, keEZ. 


Let us note that the determinants of all the matrices (7.29) are equal to 1, and 
therefore, for a proper orthogonal transformation (see the definition on p. 135), the 
number of —1’s on the main diagonal in (7.28) is even, and for an improper orthog- 
onal transformation, that number is odd. 

Let us now look at what the theorems we have proved give us in the cases n = 
1,2, 3 familiar from analytic geometry. 

For n = 1, there exist, as we have already seen, altogether two orthogonal trans- 
formations, namely & and —&, the first of which is proper, and the second, improper. 

For n = 2, a proper orthogonal transformation is a rotation of the plane through 
some angle g. In an arbitrary orthonormal basis, its matrix has the form Ag from 
(7.29), with no restriction on the angle ¢. For the improper transformation appearing 


7.2 Orthogonal Transformations 229 


Fig. 7.3. Reflection of the 
plane with respect to a line 


in (7.28), the number —1 must be encountered an odd number of times, that is, once. 
This implies that in some orthonormal basis e1, e2, its matrix has the form 


This transformation is a reflection of the plane with respect to the line (e2) (Fig. 7.3). 
Let us now consider the case n = 3. Since the characteristic polynomial of the 
transformation U has odd degree 3, it must have at least one real root. This implies 
that in the representation (7.28), the number +1 or —1 must appear on the main 
diagonal of the matrix. 
Let us consider proper transformations first. In this case, for the matrix (7.28), 
we have only one possibility: 


1 0 0 
0 cosg —sing 
0 sing cos@ 


If the matrix is written in the basis e1, e2, e3, then the transformation U does not 
change the points of the line / = (e;) and represents a rotation through the angle @ 
in the plane (e2, e3). In this case, we say that the transformation U is a a rotation 
of the plane through the angle y about the axis |. That every proper orthogonal 
transformation of a three-dimensional Euclidean space possesses a “rotational axis” 
is a result first proved by Euler. We shall discuss the mechanical significance of this 
assertion later, in connection with motions of affine spaces. 

Finally, if an orthogonal transformation is improper, then in expression (7.28), 
we have only the possibility 


-1 0 0 
0 cosg —sing 
0 sing cos@ 


In this case, the orthogonal transformation U reduces to a rotation about the /-axis 
with a simultaneous reflection with respect to the plane /+. 


230 7 Euclidean Spaces 


7.3 Orientation of a Euclidean Space* 


In a Euclidean space, as in any real vector space, there are defined the notions 
of equal and opposite orientations of two bases and orientation of the space (see 
Sect. 4.4). But in Euclidean spaces, these notions possess certain specific features. 

Let e],...,@, and Oi: Pee e, be two orthonormal bases of a Euclidean space L. 
By general definition, they have equal orientations if the transformation from one 
basis to the other is proper. This implies that for a transformation U such that 


U(e1) =e}, Sab U(en) =e',, 


the determinant of its matrix is positive. But in the case that both bases under consid- 
eration are orthonormal, the mapping U, as we know, is orthogonal, and its matrix 
U satisfies the relationship |U| = +1. This implies that U is a proper transforma- 
tion if and only if |U| = 1, and it is improper if and only if |U| = —1. We have the 
following analogue to Theorems 4.38—-4.40 of Sect. 4.4. 


Theorem 7.28 Two orthogonal transformations of a real Euclidean space can be 
continuously deformed into each other if and only if the signs of their determinants 
coincide. 


The definition of a continuous deformation repeats here the definition given in 
Sect. 4.4 for the set 21, but now consisting only of orthogonal matrices (or trans- 
formations). Since the product of any two orthogonal transformations is again or- 
thogonal, Lemma 4.37 (p. 159) is also valid in this case, and we shall make use of 
it. 


Proof of Theorem 7.28 Let us show that an arbitrary proper orthogonal transfor- 
mation U can be continuously deformed into the identity. Since the condition of 
continuous deformability defines an equivalence relation on the set of orthogonal 
transformations, then by transitivity, the assertion of the theorem will follow for all 
proper transformations. 

Thus we must prove that there exists a family of orthogonal transformations U; 
depending continuously on the parameter ¢ € [0, 1] for which Up = € and U; = U. 
The continuous dependence of U; implies that when it is represented in an arbitrary 
basis, all the elements of the matrices of the transformations U; are continuous 
functions of t. We note that this is a not at all obvious corollary to Theorem 4.38. 
Indeed, it did not guarantee us that all the intermediate transformations U; for 0 < 
t < | are orthogonal. A possible “bad” deformation A; taking us out of the domain 
of orthogonal transformations is depicted as the dotted line in Fig. 7.4. 

We shall use Theorem 7.27 and examine the orthonormal basis in which the 
matrix of the transformation U has the form (7.28). The transformation U is proper 
if and only if the number of instances of —1 on the main diagonal of (7.28) is odd. 
We observe that the second-order matrix 


(0 4) 


7.3 Orientation of a Euclidean Space* 231 


Fig. 7.4 Deformation taking nonorthogonal 
us outside the domain of transformations 
orthogonal transformations 


orthogonal 
transformations 


can also be written in the form (7.29) for g; = 2. Thus a proper orthogonal trans- 
formation can be written in a suitable orthonormal basis in block-diagonal form 


E 
A 
2 (7.30) 


Ag 


where the arguments g; can now be taken to be any values. Formula (7.30) in fact 
gives a continuous deformation of the transformation U into &. To maintain agree- 
ment with our notation, let us examine the transformations U, having in this same 
basis the matrix 


(7.31) 


Then it is clear first of all that the transformation U; is orthogonal for every t, and 
secondly, that Up = € and U; = U. This gives us a proof of the theorem in the case 
of a proper transformation. 

Let us now consider improper orthogonal transformations and show that any such 
transformation V can be continuously deformed into a reflection with respect to a 
hyperplane, that is, into a transformation ¥ having in some orthonormal basis the 
matrix 


-1 0 
F= ; ; (7.32) 

0 1 
Let us choose an arbitrary orthonormal basis of the vector space and suppose that in 
this basis, the improper orthogonal transformation VV has matrix V. Then it is obvi- 
ous that the transformation U with matrix U = VF in this same basis is a proper 
orthogonal transformation. Taking into account the obvious relationship F~! = F, 


we have V = UF, that is, V = UF. We shall use the family U,; effecting a con- 
tinuous deformation of the proper transformation U into €. From the preceding 


232 7 Euclidean Spaces 


Fig. 7.5 Oriented length B O e A 


equality, with the help of Lemma 4.37, we obtain the continuous family V; = U;F, 
where Vo = €F =F and Vj = UF = V. Thus the family V; = U;F effects the 
deformation of the improper transformation ‘V into F. 


In analogy to what we did in Sect. 4.4, Theorem 7.28 gives us the following topo- 
logical result: the set of orthogonal transformations consists of two path-connected 
components: the proper and improper orthogonal transformations. 

Exactly as in Sect. 4.4, from what we have proved, it also follows that two equally 
oriented orthogonal bases can be continuously deformed into each other. That is, if 
€1,...,@, and é\: eee é|, are orthogonal bases with the same orientation, then there 
exists a family of orthonormal bases e1(f),...,é@n(t) depending continuously on 
the parameter ¢ € [0, 1] such that e;(0) = e; and e;(1) = e'. In other words, the 
concept of orientation of a space is the same whether we define it in terms of an 
arbitrary basis or an orthonormal one. We shall further examine oriented Euclidean 
spaces, choosing an orientation arbitrarily. This choice makes it possible to speak of 
positively and negatively oriented orthonormal bases. 

Now we can compare the concepts of oriented and unoriented volume. These two 
numbers differ by the factor +1 (unoriented volumes are nonnegative by definition). 
When the oriented volume of a parallelepiped [7T(a1,...,@,) in a space L of dimen- 
sion n was introduced, we noted that its definition depends on the choice of some 
orthonormal basis e1, ..., @n. Since we are assuming that the space L is oriented, we 
can include in the definition of oriented volume of a parallelepiped [7T(a1,..., an) 
the condition that the basis e;,...,e, used in the definition of v(a,,...,a,) be 
positively oriented. Then the number v(a@ 1, ...,a,) does not depend on the choice 
of basis (that is, it remains unchanged if instead of e;,...,e@,, we take any other 
orthonormal positively oriented basis e', ...,e/,). This follows immediately from 
formula (7.13) for the transformation C = U and from the fact that the transforma- 
tion U taking one basis to the other is orthogonal and proper, that is, |U| = 1. 

We can now say that the oriented volume vu(a1,...,@,) is positive (and conse- 
quently equal to the unoriented volume) if the bases e],...,e@, and aj,..., a, are 
equally oriented, and is negative (that is, it differs from the unoriented volume by a 
sign) if these bases have opposite orientations. For example, on the line (Fig. 7.5), 
the length of the segment OA is equal to 2, while the length of the segment OB is 
equal to —2. 

Thus, we may say that for the parallelepiped [7 (a1, ..., a), its oriented volume 
is its “volume with orientation.” 

If we choose a coordinate origin on the real line, then a basis of it consists of 
a single vector, and vectors e; and we; are equally oriented if they lie to one side 
of the origin, that is, a > 0. The choice of orientation on the line, one might say, 
corresponds to the choice of “right” and “eft.” 

In the real plane, the orientation given by the basis e;, e2 is determined by the 
“direction of rotation” from e; to e2: clockwise or counterclockwise. Equally ori- 
ented bases e1, e2 and e\ e (Fig. 7.6(a) and (b)) can be continuously transformed 


7.4 Examples* 233 


Fig. 7.6 Oriented bases of e; ef 
the plane 


(a) (b) () 


one into the other, while oppositely oriented bases cannot even if they form equal 
figures (Fig. 7.6(a) and (c)), since what is required for this is a reflection, that is, an 
improper transformation. 

In real three-dimensional space, the orientation is defined by a basis of three 
orthonormal vectors. We again meet with two opposite orientations, which are rep- 
resented by our right and left hands (see Fig. 7.7(a)). Another method of providing 
an orientation in three-dimensional space is defined by a helix (Fig. 7.7(b)). In this 
case, the orientation is defined by the direction in which the helix turns as it rises— 
clockwise or counterclockwise.” 


7.4 Examples* 


Example 7.29 By the term “figure” in a Euclidean space L we shall understand an 
arbitrary subset S C L. Two figures S and S’ contained in a Euclidean space M of 
dimension n are said to be congruent, or geometrically identical, if there exists an 
orthogonal transformation U of the space M taking S to S’. We shall be interested 
in the following question: When are figures S and S’ congruent, that is, when do we 
have U(S) = S’? 

Let us first deal with the case in which the figures S and S’ consist of collections 


of m vectors: S = (a1,...,@m) and S’ = (a),,...,a/,,) with m <n. For S and S’ 
to be congruent is equivalent to the existence of an orthogonal transformation U 
such that U(a;) = a’ for alli = 1,...,m. For this, of course, it is necessary that the 


(a) (b) 


Fig. 7.7 Different orientations of three-dimensional space 


?The molecules of amino acids likewise determine a certain orientation of space. In biology, the 
two possible orientations are designated by D (right = dexter in Latin) and L (left = /aevus). For 
some unknown reason, they all determine the same orientation, namely the counterclockwise one. 


234 7 Euclidean Spaces 


following equality holds: 


(ai,aj)=(aj,a',), i,j =1,...,m. (7.33) 

Let us assume that vectors a1,...,@m are linearly independent, and we shall 

then prove that the condition (7.33) is sufficient. By Theorem 7.14, in this case 

we have G(a,,...,@m) > 0, and by assumption, G(a},...,a/,,) = G(a1,...,@m). 

From this same theorem, it follows that the vectors a',,...,a/,, Will also be linearly 
independent. 
Let us set 

L=(a1,...,@m), L’=(a\,...,a',), (7.34) 

and consider first the case m =n. Let M= (a,,...,@m). We shall consider the 

transformation U : M— M given by the conditions U(a;) = a’ foralli=1,...,m. 


Obviously, such a transformation is uniquely determined, and by the relationship 


(« (S32) u(S- 7) = (s aja’, S40) = = a Bj (a;,a') 
i=l j=l i=l j=l 


i,j=l 


and equality (7.33), it is orthogonal. 

Let m <n. Then we have the decomposition M=L@L+ =L’ @ (L’)+, where 
the subspaces L and L’ of the space M are defined by formula (7.34). By what has 
gone before, there exists an isomorphism V : L — L’ such that V(a;) =a’ for all 
i= 1,...,m. The orthogonal complements Lt and (L’)+ of these subspaces have 
dimension n — m, and consequently, are also isomorphic (Theorem 7.22). Let us 
choose an arbitrary isomorphism ‘W : Lt — (L’)+. As a result of the decomposition 
M=L@®L-, an arbitrary vector x € M can be uniquely represented in the form x = 
y +z, where y € Land z € Lt. Let us define the linear transformation U:M— M 
by the formula U(x) = V(y) + W(z). By construction, U(a;) = a’ for all i = 
1,...,m, and a trivial verification shows that the transformation U is orthogonal. 

Let us now consider the case that S =/ and S’ = 1’ are lines, and consequently, 
consist of an infinite number of vectors. It suffices to set / = (e) and /' = (e’), where 
|e| = |e’| = 1, and to use the fact that there exists an orthogonal transformation U 
of the space M taking e to e’. Thus any two lines are congruent. 

The next case in order of increasing complexity is that in which figures S$ and 
S’ each consist of two lines: S = 1; Uly and S’ =, U1}. Let us set J; = (e;) and 
I; = (e'), where |e;| = |e;| = 1 for i = 1 and 2. Now, however, vectors e; and e2 
are no longer defined uniquely, but can be replaced by —e; or —e2. In this case, 
their lengths do not change, but the inner product (e1, e2) can change their sign, 
that is, what remains unchanged is only their absolute value |(e1, e2)|. Based on 
previous considerations, we may say that figures S and S’ are congruent if and only 
if |(e1, e2)| = |(e}, e5)|. If g is the angle between the vectors e; and e2, then we 
see that the lines /; and / determine | cos g|, or equivalently the angle g, for which 
0 <¢ < %. In textbooks on geometry, one often reads about two angles between 
straight lines, the “acute” and “obtuse” angles, but we shall choose only the one that 


7.4 Examples* 235 


is acute or a right angle. This angle ¢ is called the angle between the lines 1, and Ip. 
The previous exposition shows that two pairs of lines 1), /2 and //,/5 are congruent 
if and only if the angles between them thus defined coincide. 

The case in which a figure S consists of a line / and a plane L (dim/ = 1, 
dim L = 2) is also related, strictly speaking, to elementary geometry, since dim(/ + 
L) < 3, and the figure S = / UL can be embedded in three-dimensional space. But we 
shall consider it from a more abstract point of view, using the language of Euclidean 
spaces. Let / = (e) and let f be the orthogonal projection of e onto L. The angle 
y between the lines / and /' = (f) is called the angle between I and L (as already 
mentioned above, it is acute or right). The cosine of this angle can be calculated 
according to the following formula: 


pa leh 
lel-LFT 


Let us show that if the angle between the line / and the plane L is equal to the 
angle between the line /’ and the plane L’, then the figures S =/ULand S’ =//UU 
are congruent. First of all, it is obvious that there exists an orthogonal transformation 
taking L to L’, so that we may consider that L= L’. Let / = (e), |e| = 1 andl’ = (e’), 
|e’| = 1, and let us denote by f and f’ the orthogonal projections e and e’ onto L. 
By assumption, 


(7.35) 


le Al le. fl 

lel-Ifl lel LAT 
Since e and e’ can be represented in the form e = f +x and e’ = f’ + y, 
where x, y € L-, it follows that |(e, f)| =|f|?, |(e’, f| = | f'|?. Moreover, |e| = 
|e’| = 1, and the relationship (7.36) shows that | f| = |’. 

Since e=x + f, we have |e|” = |x|? + 2(x, f) + | f|?, from which, if we take 
into account the equalities |e? = 1 and (x, f) = 0, we obtain |x|? = 1 — | f |? and 
analogously, | y|* = 1 — | f’|*. From this follows the equality |x| = |y|. Let us de- 
fine the orthogonal transformation U of the space M=L@ L+ whose restriction to 
the plane L carries the vector f to f”’ (this is possible because | f| = | f’|), while 
the restriction to its orthogonal complement Lt takes the vector x to y (which is 
possible on account of the equality |x| = |y|). Clearly, U takes e to e’ and hence / 
to I’, and by construction, the plane L in both figures is one and the same, and the 
transformation U takes it into itself. 

We encounter a new and more interesting situation when we consider the case 
in which a figure S consists of a pair of planes L; and Lz (dimL; = dimL2 = 2). 
If Lt NLo ¥ (0), then dim(L; + L2) < 3, and we are dealing with a question from 
elementary geometry (which, however, can be considered simply in the language of 
Euclidean spaces). Therefore, we shall assume that L} M Lz = (0) and similarly, that 
Li OL, = (0). When are figures S = L; UL) and S’ = L} UL4 congruent? It turns 
out that for this to occur, it is necessary that there be agreement of not one (as in the 
examples considered above) but two parameters, which can be interpreted as two 
angles between the planes L; and Lo. 


(7.36) 


236 7 Euclidean Spaces 


We shall consider all possible straight lines lying in the plane L; and the angles 
that they form with the plane Lz. To this end, we recall the geometric interpretation 
of the angle between a line / and a plane L. If / = (e), where |e| = 1, then the angle 
gy between / and L is determined by formula (7.35) with the condition 0 < g < oe 
where f is the orthogonal projection of the vector e onto L. From this, it follows that 
e = f +x, where x € Lt, and this implies that (e, f) = (f, f) + (x, f) =I|F1?, 
whence the relationship (7.35) gives | cos g| = | f|. In other words, to consider all 
the angles between lines lying in the plane L; and the plane Lz, we must consider 
the circle in Lj consisting of all vectors of length 1 and the lengths of the orthogonal 
projections of these vectors onto the plane L2. In order to write down these angles 
in a formula, we shall consider the orthogonal projection M — L2 of the space M 
onto the plane Lz. Let us denote by # the restriction of this linear transformation 
to the plane L;. Then the angles of interest to us are given by the formula | cos g| = 
|P(e)|, where e are all possible vectors in the plane L; of unit length. We restrict 
our attention to the case in which the linear transformation P is an isomorphism. 
The case in which this does not occur, that is, when the kernel of the transformation 
Ff is not equal to (0) and the image is not equal to Lo, is dealt with similarly. 

Since Ff is an isomorphism, there is an inverse transformation P's ok. 
Let us choose in the planes Lj and Lz orthonormal bases e;, e2 and gj, g>. Let the 
vector e € L; have unit length. We set f = P(e), and assuming that f = x,g,; + 
x25, we shall obtain equations for the coordinates x; and x2. Let us set 


P'(g1) =ae, + feo, P—' (gy) = ye} + deo. 
Since f = P(e), it follows that 
e=P'(f)=xP!(g1) + x2P (go) = (ai + yx2)e1 + (Bx1 + 5x2)e2, 


and the condition | P~!(f)| = 1, which we shall write in the form |P~!(f)|? = 1, 
reduces to the equality (ax; + yx2)* + (Bx, + 5x2)? = |, that is, 


(a? + B*)x? + 2(ay + Bd)x1x2 + (y? +87)x3 = 1. (7.37) 


Equation (7.37) with variables x1, x2 defines a second-degree curve in the rect- 
angular coordinate system determined by the vectors g, and gy. This curve is 
bounded, since | f| < |e| (f is the orthogonal projection of the vector e), and this 
implies that (f 5 < 1, that is, x? + i < 1. As one learns in a course on analytic 
geometry, such a curve is an ellipse. In our case, it has its center of symmetry at the 
origin O, that is, it is unchanged by a change of variables x} ~ —x1, x2 > —x2 
(see Fig. 7.8). 

It is known from analytic geometry that an ellipse has two distinguished points A 
and A’, symmetric with respect to the origin, such that the length |OA| = |OA’"| is 
greater than |OC| for all other points C of the ellipse. The segment |O A| = |O A’| 
is called the semimajor axis of the ellipse. Similarly, there exist points B and B’ 
symmetric with respect to the origin such that the segment |O B| = |O B’| is shorter 
than every other segment |OC|. The segment |O B| = |O B’| is called the semiminor 
axis of the ellipse. 


7.4  Examples* 237 


Fig. 7.8 Ellipse described by 
equation (7.37) 


Let us recall that the length of an arbitrary line segment |OC|, where C is any 
point on the ellipse, gives us the value cos g, where ¢ is the angle between a certain 
line contained in L; and the plane Ly. From this it follows that cos @ attains its 
maximum for one value of g, while for some other value of ¢ it attains its minimum. 
Let us denote these angles by g; and ¢2 respectively. By definition, 0 < 9) < g2 < 
7 It is these two angles that are called the angles between the planes L and Ly. 

The case that we have omitted, in which the transformation # has a nonnull 
kernel, reduces to the case in which the ellipse depicted in Fig. 7.8 shrinks to a line 
segment. 

It now remains for us to check that if both angles between the planes (Lj, L2) 
are equal to the corresponding angles between the planes (L{, L5), then the figures 
S=L, UL, and S’=L) UL, will be congruent, that is, there exists an orthogonal 
transformation U taking the plane L; into L;, i = 1, 2. 

Let g, and @2 be the angles between L, and L2, equal, by hypothesis, to the angles 
between L' and L4,. Reasoning as previously (in the case of the angle between a line 
and a plane), we can find an orthogonal transformation that takes Ly to L}. This 
implies that we may assume that L2 = L,. Let us denote this plane by L. Here, of 
course, the angles g; and g2 remain unchanged. Let cos g; < cos ¢2 for the pair of 
planes L; and L. This implies that cos g; and cos ¢2 are the lengths of the semiminor 
and semimajor axes of the ellipse that we considered above. This is also the case for 
the pair of planes L’ and L. By construction, this means that cosg; = |f;|=|f il 
and cos 2 = | f2| = |f5|, where the vectors f; € L are orthogonal projections of 
the vectors e; € Li of length 1. Reasoning similarly, we obtain the vectors f; €L 
andeieLi,i=1,2. 

Since | f | =|f4|. |f2| = | £5], and since by well-known properties of the el- 
lipse, its semimajor and semiminor axes are orthogonal, we can find an orthogonal 
transformation of the space M that takes f; to f, and fy to f, and having done so, 
assume that f,; = f and f= f%5. But since an ellipse is defined by its semiaxes, 
it follows that the ellipses C, and C} that are obtained in the plane L from the planes 
L; and L simply coincide. Let us consider the orthogonal projections of the space 
M to the plane L. Let us denote by its restriction to the plane L}, and by P’ its 
restriction to the plane L}. 

We shall assume, as we did previously, that the transformations P :L; — L and 
PP’: > L are isomorphisms of the corresponding linear spaces, but it is not at all 
necessary that they be isomorphisms of Euclidean spaces. Let us represent this with 


238 7 Euclidean Spaces 


arrows in a commutative diagram 


y L (7.38) 


Ly 


and let us show that the transformations P and #’ differ from each other by an 
isomorphism of Euclidean spaces L, and L}. In other words, we claim that the trans- 
formation V = (P’)~!F is an isomorphism of the Euclidean spaces L, and oe 

As the product of isomorphisms of linear spaces, the transformation V is also an 
isomorphism, that is, a bijective linear transformation. It remains for us to verify that 
YY preserves the inner product. As noted above, to do this, it suffices to verify that 
V preserves the lengths of vectors. Let x be a vector in L. If x = 0, then the vector 
V(x) is equal to 0 by the linearity of V, and the assertion is obvious. If x 4 0, then 
we set e =a~!x, where a = |x|, and then |e| = 1. The vector P(e) is contained 
in the ellipse C in the plane L. Since C = C’, it follows that P(e) = P’(e’), where 
e’ is some vector in the plane L', and |e’| = 1. From this we obtain the equality 
(P')-! P(e) =e’, that is, V(e) =e’ and |e’| = 1, which implies that |V(x)| =a = 
|x|, which is what we had to prove. 

We shall now consider a basis of the plane L consisting of vectors f, and f, ly- 
ing on the semimajor and semiminor axes of the ellipse C = C’, and augment it with 
vectors €1,e2, where P(e;) = f;. We thereby obtain four vectors e1,e2, f;, fz in 
the space Lj + L (it is easily verified that they are linearly independent). Similarly, 
in the space L{ +L, we shall construct four vectors e', e5, f, fz. We shall show 
that there exists an orthogonal transformation of the space M taking the first set of 
four vectors into the second. To do so, it suffices to prove that the inner products of 
the associated vectors (in the order in which we have written them) coincide. Here 
what is least trivial is the relationship (e\. e) = (€], €2), but it follows from the fact 
that e’ = V(e;), where V is an isomorphism of the Euclidean spaces L; and De The 
relationship (e), f;) = (e1, f 1) is a consequence of the fact that f, is an orthog- 
onal projection, (e;, f;) = |f,|?, and similarly, (fv =Hlhi |*. The remaining 
relationships are even more obvious. 

Thus the figures S = L; ULy and S’=L{ UL, are congruent if and only if both 
angles between the planes Lj,L2 and L{,L5 coincide. With the help of theorems 
to be proved in Sect. 7.5, it will be easy for the reader to investigate the case of a 
pair of subspaces Lj, Lz C M of arbitrary dimension. In this case, the answer to the 
question whether two pairs of subspaces S = Lj UL and S’ = Li UL/ are congruent 
is determined by the agreement of two finite sets of numbers that can be interpreted 
as “angles” between the subspaces L;, Lz and L}, L4. 


7.4 Examples* 239 


Example 7.30 When the senior of the two authors of this textbook gave the course 
on which it is based (this was probably in 1952 or 1953) at Moscow State Uni- 
versity, he told his students about a question that had arisen in the work of A.N. 
Kolmogorov, A.A. Petrov, and N.V. Smirnov, the answer to which in one particular 
case had been obtained by A.I. Maltsev. This question was presented by the pro- 
fessor as an example of an unsolved problem that had been worked on by noted 
mathematicians yet could be formulated entirely in the language of linear algebra. 
At the next lecture, that is, a week later, one of the students in the class came up to 
him and said that he had found a solution to the problem.* 

The question posed by A.N. Kolmogorov et al. was this: In a Euclidean space 
L of dimension n, we are given n nonnull mutually orthogonal vectors x1,...,Xn, 
that is, (x;,X;) = 0 for alli 4 j,i, 7 =1,...,n. For what values m <n does there 
exist an m-dimensional subspace M C L such that the orthogonal projections of the 
vectors X1,...,X, to it all have the same length? A.I. Maltsev showed that if all 
the vectors x;,...,%, have the same length, then there exists such a subspace M of 
each dimension m <n. 

The general case is approached as follows. Let us set |x;| = a; and assume that 
there exists an m-dimensional subspace M such that the orthogonal projections of all 
vectors x; to it have the same length aw. Let us denote by # the orthogonal mapping 
to the subspace M, so that |P(x;)| =a. Let us set f; = ay 'x;. Then the vectors 


Jf \,---,f, form an orthonormal basis of the space L. Conversely, let us select in L 
an orthonormal basis e;,..., @, such that the vectors e;,..., @m form a basis in M, 
that is, for the decomposition 

L=Me@mM"', (7.39) 
we join the orthonormal basis e, ..., @m of the subspace M to the orthonormal basis 
€m+1,---,@n Of the subspace Mt, 

Let f; = A ugieg. Then we can interpret the matrix U = (ug;) as the ma- 
trix of the linear transformation U, written in terms of the basis e1,..., én, taking 
vectors @1,...,@, to vectors f,,..., f,,. Since both sets of vectors e),...,@, and 
Jt i,---,f, are orthonormal bases, it follows that U is an orthogonal transforma- 


tion, in particular, by formula (7.18), satisfying the relationship 
UU*=E. (7.40) 


From the decomposition (7.39) we see that every vector f; can be uniquely rep- 
resented in the form of a sum f; = u; + v;, where u; € M and 0; € Mt, By defi- 
nition, the orthogonal projection of the vector f; onto the subspace M is equal to 
P(f;) = uj. By construction of the basis €1,..., €n, it follows that 


m 


PF i) = Yo unier. 


k=1 


3It was published as L.B. Nisnevich, V.I. Bryzgalov, “On a problem of n-dimensional geometry,” 
Uspekhi Mat. Nauk 8:4(56) (1953), 169-172. 


240 7 Euclidean Spaces 


By assumption, we have the equalities IPF)? — |P (az 'x;)/ = aa, which 
in coordinates assume the form 


m 


a a. a i=l,...,n. 
k=1 


If we sum these relationships for all i = 1, ..., and change the order of summation 
in the double sum, then taking into account the relationship (7.40) for the orthogonal 
matrix U, we obtain the equality 


n n m m n 
2 2 2 2 
& Var = ui =o iui =m. (7.41) 
i=l i=l k=1 k=1 i=1 
from which it follows that a can be expressed in terms of a1,...,@,, and m by the 


formula 


n —1 
az=m (> °°) : (7.42) 


i=1 


2 


From this, in view of the equalities |P(f;)|? =|P (a; 'xi)/? = a7a;7, we ob- 


tain the expressions 


; = 
jmural? =m( 0? 0:7) | ek 


i=1 


By Theorem 7.10, we have |P(f;)| < | f;|, and since by construction, | f;| = 1, we 
obtain the inequalities 


n =k 
2 ~2 . 
m| a; ) a, <1, i=1,...,n, 


i=1 


from which it follows that 


n 
Ga Sm, PS Ujycagh (7.43) 
i=l 


Thus the inequalities (7.43) are necessary for the solvability of the problem. Let 
us show that they are also sufficient. 

Let us consider first the case m = 1. We observe that in this situation, the in- 
equalities (7.43) are automatically satisfied for an arbitrary collection of positive 
numbers @|,...,@,. Therefore, for an arbitrary system of mutually orthogonal vec- 
tors X;,...,X, in L, we must produce a line M C L such that the orthogonal projec- 
tions of all these vectors onto it have the same length. For this, we shall take as such 


7.4  Examples* 241 


a line M = (y) with the vectors 


n 


y=) (a1: +0n)” 
=). 79— 
i=l a; 


(xi.y) (xi,y) —(0j 

5 , y) =, it fol- 
aan ye pet) = oh 
lows that the orthogonal projection of the vector x; onto the line M is equal to 


where as before, a? = (x;,x;). Since 


y € Mand (x; — 


(Xi, yy 
ly|? ~ 


Clearly, the length of each such projection 


P(x) = 


(xi, y)| = (a -- “On)? 


|P(x;)| = 
ly| ly| 


does not depend on the index of the vector x;. Thus we have proved that for an 
arbitrary system of m nonnull mutually orthogonal vectors in an n-dimensional Eu- 
clidean space, there exists a line such that the orthogonal projections of all vectors 
onto it have the same length. 

To facilitate understanding in what follows, we shall use the symbol P(m,n) 
to denote the following assertion: If the lengths a1,...,a@, of a system of mutu- 
ally orthogonal vectors x;,...,%, in an n-dimensional Euclidean space L satisfy 
condition (7.43), then there exists an m-dimensional subspace M C L such that the 
orthogonal projections P(x1),..., P(x,) of the vectors x;,...,%, onto it have the 
same length a, expressed by the formula (7.42). Using this convention, we may say 
that we have proved the assertion P(1,7) for all n > 1. 

Before passing to the case of arbitrary m, let us recast the problem in a more 
convenient form. Let £),..., By, be arbitrary numbers satisfying the following con- 
dition: 

Bit---+Br=m, O< 6 <1li=l,...,n. (7.44) 
Let us denote by P’(m, n) the following assertion: In the Euclidean space L there 
exist an orthonormal basis g,,..., g,, and an m-dimensional subspace L’ C L such 


that the orthogonal projections P’(g;) of the basis vectors onto L’ have length ./8;, 
that is, 


\P'(gp)|=Bi, i=1,...,n. 


Lemma 7.31 The assertions P(m,n) and P'(m,n) with a suitable choice of num- 
bers a\,...,Qn and B,,..., By are equivalent. 


Proof Let us first prove that the assertion P’(m,n) follows from the assertion 
P(m,n). Here we are given a collection of numbers 6),..., 6, satisfying the con- 
dition (7.44), and it is known that the assertion P(m, n) holds for arbitrary positive 


242 7 Euclidean Spaces 


numbers @|,..., @, satisfying condition (7.43). For the numbers 6), ..., 6, and ar- 
bitrary orthonormal basis g,,..., g,, we define vectors x; = Bs; P= 1,644 


It is clear that these vectors are mutually orthogonal, and furthermore, |x;| = i . 


Let us prove that the numbers a; = B, ? satisfy the inequalities (7.43). Indeed, if 
we take into account the condition (7.44), we have 


n n 

2 —2 —l =| 

a; ) a,” = B, ) Bj = 8B, m=m. 
i=l i=l 


The assertion P(m,n) says that in the space L there exists an m-dimensional 
subspace M such that the lengths of the orthogonal projections of the vectors x; 
onto it are equal to 


; =1 z =1 
|P(x)| == moe?) = moa =1. 
i=l i=l 


But then the lengths of the orthogonal projections of the vectors g; onto the same 
subspace M are equal to |P(g;)| =|P(./Bixi)| = /Bi- 

Now let us prove that the assertion P’(m,n) yields P(m,n). Here we are given 
a collection of nonnull mutually orthogonal vectors x1,...,X, of length |x;| = a;, 
and moreover, the numbers a; satisfy the inequalities (7.43). Let us set 


n =1 
=a ~3 
Bj =a, ~m ) a, 
i=l 


and verify that £; satisfies conditions (7.44). The equality 6; + ---+ By, =m clearly 
follows from the definition of the numbers £;. From the inequalities (7.43) it follows 
that 


n -1 
a? > ma? , 
i=1 


and this implies that 


= 
n 
Bi =#;?n(Sa;?) <i. 
i=1 


The assertion P’(m,n) says that there exist an orthonormal basis g,,..., g,, of 
the space L and an m-dimensional subspace L’ C L such that the lengths of the 
orthogonal projections of the vectors g; onto it are equal to | P’(g;)| = //B;. But 
then the orthogonal projections of the mutually orthogonal vectors BP : g; onto 
the same subspace L’ will have the same length, namely 1. 

To prove the assertion P(m,n) for given vectors x1,...,Xy, it now suffices to 
consider the linear transformation U of the space L mapping the vectors g; to 


7.4  Examples* 243 


U(g;) = f;, where f; = a, 'x;. Since the bases g,,...,g, and f,,..., f,, are 
orthonormal, it follows that U is an orthogonal transformation, and therefore, the 
orthogonal projections of the x; onto the m-dimensional subspace M = U(L’) have 
the same length. Moreover, by what we have proved above, this length is equal to the 
number @ determined by formula (7.42). This completes the proof of the lemma. 


Thanks to the lemma, we may prove the assertion P’(m, n) instead of the asser- 
tion P(m,n). We shall do so by induction on m and n. We have already proved the 
base case of the induction (m = 1, n > 1). The inductive step will be divided into 
three parts: 


(1) From assertion P’(m,n) for 2m <n-+1 we shall derive P’(m,n + 1). 

(2) We shall prove that the assertion P’(m,n) implies P’(n, m —n). 

(3) We shall prove that the assertion P’(m+1,n) for alln > m-+1 is aconsequence 
of the assertion P’(m’,n) for all m’ <m andn>m’. 


Part 1: From assertion P’(m,n) for 2m <n+1, we derive P’(m,n+1). We shall 
consider the collection of positive numbers 61,..., Bn, Bn41 Satisfying conditions 
(7.44) with n replaced by n + 1, with 2m < (n + 1). Without loss of generality, we 
may assume that 6; > Bo >--- > Bn41. Since 6) +---+ By41 =m andn+1> 
2m, it follows that 6, + Bn+1 < 1. Indeed, for example for odd n, the contrary 
assumption would give the inequality 


Bi + B2>--- > Bn t+ Bao > 1, 
a 
(n+1)/2 sums 


from which clearly follows 6] +---+ Bn41 > (2+ 1)/2 = m, which contradicts the 
assumption that has been made. 

Let us consider the (7 + 1)-dimensional Euclidean space L and decompose it as 
a direct sum L = (e) @ (e)+, where e € L is an arbitrary vector of length 1. By the 
induction hypothesis, the assertion P’(m,n) holds for numbers £1,..., B,»—1 and 
B= Bn + Bn+1 and the n-dimensional Euclidean space (e)+. This implies that in 
the space (e)+, there exist an orthonormal basis Z1,---,, and an m-dimensional 
subspace L’ such that the squares of the lengths of the orthogonal projections of the 
vectors g; onto L’ are equal to 


IP'(g)/=Bi, i=1,...,.n-1, — |P(en)\/ = Bn + Bott 


We shall denote by & : L—> L’ the orthogonal projection of the space L onto 
L’ (in this case, of course, P(e) = 0), and we construct in L an orthonormal basis 
21...» Bn41 for which |P(g;)|? = B; for alli=1,...,n +1. 

Let us set g; = g; fori=1,...,n —2 and g, =ag,, + be, 8,4; =cg, + de, 
where the numbers a, b, c, d are chosen in such a way that the following conditions 
are satisfied: 


2nl =l2nqil=1, (Bn Bn41) =9, 


a ee (7.45) 
|P-(&,)| =Bn, |P Bn41)| = Bnti- 


244 7 Euclidean Spaces 


Then the system of vectors g1,..., %,41 proves the assertion P’(m,n-+ 1). 
The relationships (7.45) can be rewritten in the form 


at+bh=c*4+d7=1, ac+bd=0, 
a’ (Bn + Buti) = Bn, € (Bn + Bui) = Bn4t- 


It is easily verified that these relationships will be satisfied if we set 


4 fis — Bn — — Pott 
, Bn + Bn+i , Bn oP Bn+1 


Before proceeding to part 2, let us make the following observation. 


b=+c, d 


Proposition 7.32 To prove the assertion P'(m,n), we may assume that B; < 1 for 
alli=1,...,n. 


Proof Let 1 = Bj = --- = Be > Beat = --- = Bn > O. We choose in the n- 
dimensional vector space L an arbitrary subspace Lx of dimension k and consider 


the orthogonal decomposition L = Lx ® Ly We note that 


1> Beyi >? > Bo >O and Beypt---+Bp=m—k. 


Therefore, if the assertion P’(m — k,n — k) holds for the numbers 6y+1,..-, Bn, 
then in ir there exist a subspace L, of dimension m — k and an orthonormal basis 
Sk41> +++, Such that |P(g,)/7 = 6; fori=k+1,...,n, where P: es => iL, is 


an orthogonal projection. 

We now set L’' = 1, © Li and choose in Lg an arbitrary orthonormal ba- 
sis g},---, 8%. Then if P’: L— L’ is the orthogonal projection, we have that 
|P'(g,)|* = 1 fori =1,...,k and |P'(g;)|? = 8; fori =K+1,...,n. 


Part 2: Assertion P’(m,n) implies assertion P’(n,m —n). Let us consider n 
numbers 6; > --- > 6, satisfying condition (7.44) in which the number m is re- 
placed by n — m. We must construct an orthogonal projection ?’ : L > L’ of the 
n-dimensional Euclidean space L onto the (m — n)-dimensional subspace L’ and 
an orthonormal basis g,,...,g, in L for which the conditions |P'(g;)I = Bi; 
i=1,...,n, are satisfied. By a previous observation, we may assume that all 6; are 
less than 1. Then the numbers 6; = 1 — f; satisfy conditions (7.44), and by assertion 
P'(m,n), there exist an orthonormal projection P : L —> L of the space L onto the 
m-dimensional subspace L and an orthonormal basis g),..., g,, for which the con- 
ditions |P(g,)/7 = f; are satisfied. For the desired (m — n)-dimensional subspace 
we shall take L’ = L+ and denote by ?’ the orthogonal projection onto L’. Then for 
eachi = 1,...,n, the equalities 


g:=P(g)+ P(g),  1=leil?=|P(g)|? +|P(ed|? = 8) + |P'(ed/ 


7.5 Symmetric Transformations 245 


are satisfied, from which it follows that | P’(g;)|? = 1 — A = £;. 

Part 3: Assertion P’(m + 1,7) for all n > m+ 1 is a consequence of P’(m’,n) 
for all m’ < m and n > m’. By our assumption, the assertion P’(m,n) holds in 
particular for n = 2m + 1. By part 2, we may assert that P’(m + 1, 2m + 1) holds, 
and since 2(m + 1) < (2m + 1) + 1, then by virtue of part 1, we may conclude that 
P'(m+1,n) holds for all n > 2m-+ 1. It remains to prove the assertions P’(m-+ 1, n) 
for m +2 <n < 2m. But these assertions follow from P’(n — (m+ 1), n) by part 2. 
It is necessary only to verify that the inequalities 1 <n — (m+ 1) <m are satisfied, 
which follows directly from the assumption that m+ 2 <n < 2m. 


7.5 Symmetric Transformations 


As we observed at the beginning of Sect. 7.1, for a Euclidean space L, there exists 
a natural isomorphism L + L* that allows us to identify in this case the space L* 
with L. In particular, using the definition given in Sect. 3.7, we may define for an 
arbitrary basis €1,..., @, of the space L the dual basis f,,..., f,, of the space L by 
the condition (f;,e;) = 1, (f;,e;) =0 fori ¢ j. Thus an orthonormal basis is one 
that is its own dual. 

In the same way, we can assume that for an arbitrary linear transformation 
“”A:L— L, the dual transformation A* : L* — L* defined in Sect. 3.7 is a linear 
transformation of the Euclidean space L into itself and is determined by the condi- 
tion 


(A*(x), y) = (x, A(y)) (7.46) 


for all vectors x, y € L. By Theorem 3.81, the matrix of the linear transformation A 
in an arbitrary basis of the space L and the matrix of the dual transformation * in 
the dual basis are transposes of each other. In particular, the matrices of the trans- 
formations A and A* in an arbitrary orthonormal basis are transposes of each other. 
This is in accord with the notation A* that we have chosen for the transpose matrix. 
It is easily verified also that conversely, if the matrices of transformations A and B 
in some orthonormal basis are transposes of each other, then the transformations A 
and 8 are dual. 

As an example, let us consider the orthogonal transformation U, for which 
by definition, the condition (U(x), U(y)) = (x, y) is satisfied. By formula 
(7.46), we have the equality (U(x), U(y)) = (x, U*U(y)), from which follows 
(x, UXU(y)) = (x, y). This implies that (x, U*U(y) — y) = 0 for all vectors x, 
from which follows the equality U*U(y) = y for all vectors y € L. In other words, 
the fact that U*U is equal to &, the identity transformation, is equivalent to the 
property of orthogonality of the transformation U. In matrix form, this is the rela- 
tionship (7.18). 


Definition 7.33 A linear transformation A of a Euclidean space is called symmetric 
or self-dual if A* = A. 


246 7 Euclidean Spaces 


In other words, for a symmetric transformation A and arbitrary vectors x and y, 
the following condition must be satisfied: 


(A(x), y) = (x, A(y)), (7.47) 


that is, the bilinear form g(x, y) = (A(x), y) is symmetric. As we have seen, from 
this it follows that in an arbitrary orthonormal basis, the matrix of the transformation 
A is symmetric. 

Symmetric linear transformations play a very large role in mathematics and its 
applications. Their most essential applications relate to quantum mechanics, where 
symmetric transformations of infinite-dimensional Hilbert space (see the note on 
p. 214) correspond to what are called observed physical quantities. We shall, how- 
ever, restrict our attention to finite-dimensional spaces. As we shall see in the sequel, 
even with this restriction, the theory of symmetric linear transformations has a great 
number of applications. 

The following theorem gives a basic property of symmetric linear transforma- 
tions of finite-dimensional Euclidean spaces. 


Theorem 7.34 Every symmetric linear transformation of a real vector space has an 
eigenvector. 


In view of the very large number of applications of this theorem, we shall present 
three proofs, based on different principles. 


Proof of Theorem 7.34 First proof. Let A be a symmetric linear transformation 
of a Euclidean space L. If dimL > 2, then by Theorem 4.22, it has a one- or two- 
dimensional invariant subspace L’. It is obvious that the restriction of the transforma- 
tion A to the invariant subspace L’ is also a symmetric transformation. If dimL’ = 1, 
then we have L’ = (e), where e 4 0, and this implies that e is an eigenvector. Con- 
sequently, to prove the theorem, it suffices to show that a symmetric linear transfor- 
mation in the two-dimensional subspace L’ has an eigenvector. Choosing in L’ an 
orthonormal basis, we obtain for A a symmetric matrix in this basis: 


a=( ae 


In order to find an eigenvector of the transformation A, we must find a real root of 
the polynomial |A — t E|. This polynomial has the form 


(a—t)(c-t) -b?’ =" —(a+c)t+ac—b* 


and has a real root if and only if its discriminant in nonnegative. But the discriminant 
of this quadratic trinomial is equal to 


(a+c)* —4(ac — b*) = (a—c)* + 4b* = 0, 


and the proof is complete. 


7.5 Symmetric Transformations 247 


Second proof. The second proof is based on the complexification L of the real 
vector space L. Following the construction presented in Sect. 4.3, we may extend 
the transformation -A to the vectors of the space L©. By Theorem 4.18, the obtained 
transformation AS : L© + L© will already have an eigenvector e € L© and eigen- 
value A € C, so that A©(e) = Ae. 

We shall extend the inner product (x, y) from the space L to L© so that it de- 
termines there a Hermitian form (see the definition on p. 210). It is clear that this 
can be accomplished in only one way: defining two vectors aj = x; + iy, and 
a2 =X2+1y> of the space L, we obtain the inner product according to the for- 
mula 


(ay, a2) = (%1,%2) + (Yq, ¥2) +i((¥1,¥2) — (1, Y2))- (7.48) 


The verification of the fact that the inner product (a1, a2) thus defined actually de- 
termines in L© a Hermitian form is reduced to the verification of sesquilinearity (in 
this case, it suffices to consider separately the product of a vector a; and a vector a2 
by areal number and by 7) and the property of being Hermitian. Here all calculations 
are completely trivial, and we shall omit them. 

An important new property of the inner product (a1, a2) that we have obtained is 
its positive definiteness, that is, like the scalar product (a, a), it is real (this follows 
from the Hermitian property) and (a, a) > 0, a £ 0 (this is a direct consequence of 
formula (7.48), for x; =xX2, y; = y2). It is obvious that for the new inner product 
we also have an analogue of the relationship (7.47), that is, 


(A° (a1), a2) = (ay, AC (a2)); (7.49) 


in other words, the form y(a1, a2) = (AC (a 1), 42) is Hermitian. Let us apply (7.49) 
to the vectors aj = a2 = e. Then we obtain (Ae, e) = (e, Ae). Taking into ac- 
count the Hermitian property, we have the equalities (Ae, e) = A(e, e) and (e, Ae) = 
A(e, e), from which it follows that A(e, e) = A(e, e). Since (e, e) > 0, we derive 
from this that 4 = A, that is, the number A is real. Thus the characteristic polyno- 
mial |.4© — &| of the transformation A© has a real root A. But a basis of the space 
L as a space over R is a basis of the space L© over C, and the matrix of the trans- 
formation A© in this basis coincides with the matrix of the transformation A. In 
other words, |A© — 1&| = |A — t&|, which implies that the characteristic polyno- 
mial |.A — t&| of the transformation A has a real root 4, and this implies that the 
transformation A :L— L has an eigenvector in the space L. 

Third proof. The third proof rests on certain facts from analysis, which we now 
introduce. We first observe that a Euclidean space can be naturally converted into a 
metric space by defining the distance r(x, y) between two vectors x and y by the 
relationship r(x, y) = |x — y|. Thus in the Euclidean space L we have the notions of 
convergence, limit, continuous functions, and closed and bounded sets; see p. xvii. 

The Bolzano—Weierstrass theorem asserts that for an arbitrary closed and 
bounded set X in a finite-dimensional Euclidean space L and arbitrary continu- 
ous function g(x) on X there exists a vector xg € X at which g(x) assumes its 


248 7 Euclidean Spaces 


maximum value: that is, g(x9) > g(x) for all x € X. This theorem is well known 
from real analysis in the case that the set X is an interval of the real line. Its proof in 
the general case is exactly the same and is usually presented somewhat later. Here 
we shall use the theorem without offering a proof. 

Let us apply the Bolzano—Weierstrass theorem to the set X consisting of all vec- 
tors x of the space L such that |x| = 1, that is, to the sphere of radius 1, and to the 
function g(x) = (x, A(x)). This function is continuous not only on X, but also on 
the entire space L. Indeed, it suffices to choose in the space L an arbitrary basis and 
to write down in it the inner product (x, A(x)) as a quadratic form in the coordinates 
of the vector x. Of importance to us is solely the fact that as a result, we obtain a 
polynomial in the coordinates. After this, it suffices to use the well-known theorem 
that states that the sum and product of continuous functions are continuous. Then 
the question is reduced to a verification of the fact that an arbitrary coordinate of the 
vector x is a continuous function of x, but this is completely obvious. 

Thus the function (x, “A(x)) assumes its maximum over the set X at some xo = e. 
Let us denote this value by 1. Consequently, (x, A(x)) < A for every x for which 
|x| = 1. For every nonnull vector y, we set x = y/|y|. Then |x| = 1, and applying 
to this vector the inequality above, we see that (y, A(y)) < A(y, y) for all y (this 
obviously holds as well for y = 0). 

Let us prove that the number A is an eigenvalue of the transformation A. To this 
end, let us write the condition that defines A in the form 


(y, A(y)) <AQ,y), A=(e, 4(€)), lel = 1, (7.50) 


for an arbitrary vector y € L. 

Let us apply (7.50) to the vector y = e + €z, where both the scalar e and vector 
z € Lare thus far arbitrary. Expanding the expressions (y, A(y)) = (e+ €z, A(e) + 
eA(zZ)) and (y, y) = (e+ €z,e+ €Z), we obtain the inequality 


(e, A(e)) +(e, 4(z)) + e(z, A(e)) + £7(A(Z), A(z) 
<A((e,e) + e(e,z) + e(z,e) +e7(z, z)). 


In view of the symmetry of the transformation A, on the basis of the properties of 
Euclidean spaces and recalling that (e,e) = 1, (e, A(e)) = A, after canceling the 
common term (e, A(e)) = A(e, e) on both sides of the above inequality, we obtain 


2e(e, A(z) — Az) + e7((A), A(Z)) — A, z)) <0. (7.51) 


Let us now note that every expression ae + be? in the case a 4 0 assumes a pos- 
itive value for some ¢. For this it is necessary to choose a value |e| sufficiently 
small that a + be has the same sign as a, and then to choose the appropriate sign 
for e. Thus the inequality (7.51) always leads to a contradiction except in the case 
(e, A(z) —Az) =0. 

If for some vector z # 0, we have A(z) = Az, then z is an eigenvector of the 
transformation A with eigenvalue 1, which is what we wished to prove. But if 


7.5 Symmetric Transformations 249 


A(z) — Az 40 for all z 40, then the kernel of the transformation A — A€ is equal 
to (0). From Theorem 3.68 it follows that then the transformation A —A& is an 
isomorphism, and its image is equal to all of the space L. This implies that for ar- 
bitrary u € L, it is possible choose a vector z € L such that uw = A(z) — Az. Then 
taking into account relationship (e, A(z) — Az) = 0, we obtain that an arbitrary vec- 
tor u € L satisfies the equality (e, u) = 0. But this is impossible at least for u = e, 
since |e| = 1. 


The further theory of symmetric transformations is constructed on the basis of 
some very simple considerations. 


Theorem 7.35 If a subspace L' of a Euclidean space L is invariant with respect 
to the symmetric transformation A, then its orthogonal complement (L')+ 
invariant. 


is also 


Proof The result is a direct consequence of the definitions. Let y be a vector in 
(L’)+. Then (x, y) =0 for all x € L’. In view of the symmetry of the transformation 
“A, we have the relationship 


(x, A(y)) = (A(®), y), 


while taking into account the invariance of L’ yields that A(x) € L’. This implies 
that (x, “(y)) = 0 for all vectors x € L’, that is, A(y) € (L’)+, and this completes 
the proof of the theorem. 


Combining Theorems 7.34 and 7.35 yields a fundamental result in the theory of 
symmetric transformations. 


Theorem 7.36 For every symmetric transformation A of a Euclidean space L of 
finite dimension, there exists an orthonormal basis of this space consisting of eigen- 
vectors of the transformation A. 


Proof The proof is by induction on the dimension of the space L. Indeed, by Theo- 
rem 7.34, the transformation A has at least one eigenvector e. Let us set 


L= (e) @ (e)*, 


where (e)+ has dimension n — 1, and by Theorem 7.35, is invariant with respect 
to A. By the induction hypothesis, in the space (e)+ there exists a required basis. If 
we add the vector e to this basis, we obtain the desired basis in L. 


Let us discuss this result. For a symmetric transformation 4, we have an or- 
thonormal basis e;,..., @, consisting of eigenvectors. But to what extent is such a 
basis uniquely determined? Suppose the vector e; has the associated eigenvalue i;. 


250 7 Euclidean Spaces 


Then in our basis, the transformation “A has matrix 


0. aw DB 
O de xs 0 

A=). 2... (7.52) 
0 ss 


But as we saw in Sect. 4.1, the eigenvalues of a linear transformation A coincide 
with the roots of the characteristic polynomial 


n 
JA —18|=|A-tE|=] [@;-0). 


i=1 


Thus the eigenvalues A,,...,4, of the transformation A are uniquely determined. 
Suppose that the distinct values among them are Aj,..., Ax. If we assemble all the 
vectors of the constructed orthonormal basis that correspond to one and the same 
eigenvalue A; (from the set A1,..., Ax of distinct eigenvalues) and consider the sub- 
space spanned by them, then we obviously obtain the eigensubspace L,, (see the 
definition on p. 138). We then have the orthogonal decomposition 


L=L,,@---@®lLy,, wherelL,, Ly; for alli # j. (7.53) 


The restriction of A to the eigensubspace L,, gives a transformation A; &, and in this 
subspace, every orthonormal basis consists of eigenvectors (with eigenvalue A; ). 

Thus we see that a symmetric transformation A uniquely defines only the eigen- 
subspace L,,, while in each of them, one can choose an orthonormal basis as one 
likes. On combining these bases, we obtain an arbitrary basis of the space L satisfy- 
ing the conditions of Theorem 7.36. 

Let us note that every eigenvector of the transformation -A lies in one of the sub- 
spaces L,,. If two eigenvectors x and y are associated with different eigenvalues 
Ai #4), then they lie in different subspaces L,, and L;,,, and in view of the orthog- 
onality of the decomposition (7.53), they must be orthogonal. We thus obtain the 
following result. 


Theorem 7.37 The eigenvectors of a symmetric transformation corresponding to 
different eigenvalues are orthogonal. 


We note that this theorem can also be easily proved by direct calculation. 


Proof of Theorem 7.37 Let x and y be eigenvectors of a symmetric transformation 
A corresponding to distinct eigenvalues 4; and A;. Let us substitute the expressions 
A(x) = Ajx and A(y) = A;y into the equality (A(x), y) = (x, A(y)). From this 
we obtain (A; — A;)(x, y) =0, and since A; #4;, we have (x, y) = 0. 


Theorem 7.36 is often formulated conveniently as a theorem about quadratic 
forms using Theorem 6.3 from Sect. 6.1 and the possibility of identifying the space 


7.5 Symmetric Transformations 251 


L* with L if the space L is equipped with an inner product. Indeed, Theorem 6.3 
shows that every bilinear form g on a Euclidean space L can be represented in the 
form 


g(x,y) =(x, A(y)), (7.54) 


where A is the linear transformation of the space L to L* uniquely defined by the bi- 
linear form g; that is, if we make the identification of L* with L, it is a transformation 
of the space L into itself. 

It is obvious that the symmetry of the transformation A coincides with the sym- 
metry of the bilinear form g. Therefore, the bijection between symmetric bilin- 
ear forms and linear transformations established above yields the same correspon- 
dence between quadratic forms and symmetric linear transformations of a Euclidean 
space L. Moreover, in view of relationship (7.54), to the symmetric transformation 
A there corresponds the quadratic form 


W(x) = (x, A(x)), 


and every quadratic form w(x) has a unique representation in this form. 

If in some basis e1,..., @,, the transformation A has a diagonal matrix (7.52), 
then for the vector x = x;e; +---+ Xen, the quadratic form w(x) has in this basis 
the canonical form 


W(x) = Aap +++ + Anxe. (7.55) 


Thus Theorem 7.36 is equivalent to the following. 


Theorem 7.38 For any quadratic form in a finite-dimensional Euclidean space, 
there exists an orthonormal basis in which it has the canonical form (7.55). 


Theorem 7.38 is sometimes conveniently formulated as a theorem about arbitrary 
vector spaces. 


Theorem 7.39 For two quadratic forms in a finite-dimensional vector space, one of 
which is positive definite, there exists a basis (not necessarily orthonormal) in which 
they both have canonical form (7.55). 


In this case, we say that in a suitable basis, these quadratic forms are reduced to 
a sum of squares (even if there are negative coefficients A; in formula (7.55)). 


Proof of Theorem 7.39 Let w(x) and y2(x) be two such quadratic forms, one of 
which, let it be w(x), is positive definite. By Theorem 6.10, there exists, in the 
vector space L in question, a basis in which the form yy (x) has the canonical form 
(7.55). Since by assumption, the quadratic form 71 (x) is positive definite, it follows 
that in formula (7.55), all the numbers A; are positive, and therefore, there exists a 
basis €1,..., @, of the space L in which yj (x) is brought into the form 


W(x) a2 pe tx?, (7.56) 


252 7 Euclidean Spaces 


Let us consider as the scalar product (x, y) in the space L the symmetric bilinear 
form g(x, y), associated by Theorem 6.6 with the quadratic form yj (x). We thereby 
convert L into a Euclidean space. 

As can be seen from formulas (6.14) and (7.56), the basis e;,..., én for this inner 
product is orthonormal. Then by Theorem 7.38, there exists an orthonormal basis 
ei: ...,@, of the space L in which the form 72(x) has canonical form (7.55). But 
since the basis e},...,e/, is orthonormal with respect to the inner product that we 
defined with the help of the quadratic form yw (x), then in this basis, yj (x) as before 
takes the form (7.56), and that completes the proof of the theorem. 


Remark 7.40 It is obvious that Theorem 7.39 remains true if in its formulation we 
replace the condition of positive definiteness of one of the forms by the condition 
of negative definiteness. Indeed, if (x) is a negative definite quadratic form, then 
the form —y (x) is positive definite, and both of these assume canonical form in one 
and the same basis. 

Without the assumption of positive (or negative) definiteness of one of the 
quadratic forms, Theorem 7.39 is no longer true. To prove this, let us derive one 
necessary (but not sufficient) condition for two quadratic forms yj (x) and y2(x) to 
be simultaneously reduced to a sum of squares. Let A; and A2 be their matrices in 
some basis. If the quadratic forms w1 (x) and w2(x) are simultaneously reducible to 
sums of squares, then in some other basis, their matrices Aj and AS will be diagonal, 
that is, 


a O -- 0 pi O --- O 
0 a: O 0 po --- O 
A=l|. ... Mls AS) es . 
0 O ++ Op 0 0 ++: Bp 


Then the polynomial | At + A5| is equal to []j_, (at + A;), that is, it can be factored 
as a product of linear factors a;t + 6;. But by formula (6.10) for replacing the matrix 
of a bilinear form through a change of basis, the matrices A;, Ai and A, A4 are 
related by 


Ai =C*AIC, A, =C*A2C, 
where C is some nonsingular matrix, that is, |C| 4 0. Therefore, 
Ait + AS] =|C*(Ait + A2)C| = |C*||Ait + AdllCl, 
from which taking into account the equality |C*| = |C|, we obtain the relationship 


|Aqt + Ag|=|C|7|Ajt + AS 


’ 


from which it follows that the polynomial |A;t + Az| can also be factored into 
linear factors. Thus for two quadratic forms yj (x) and w2(x) with matrices A; and 
Az to be simultaneously reduced each to a sum of squares, it is necessary that the 
polynomial |A;t¢ + A2| be factorable into real linear factors. 


7.5 Symmetric Transformations 253 


Now for n = 2 we set W(x) = xa _ te and w2(x) = x1 x2. These quadratic forms 
are neither positive definite nor negative definite. Their matrices have the form 


1 O 0 1 
A= (5 =) Ar=({ ae 


and it is obvious that the polynomial |A;t+ A2| = —(t?+ 1) cannot be factored into 
real linear factors. This implies that the quadratic forms w(x) and y2(x) cannot 
simultaneously be reduced to sums of squares. 

The question of reducing pairs of quadratic forms with complex coefficients to 
sums of squares (with the help of a complex linear transformation) is examined in 
detail, for instance, in the book The Theory of Matrices, by F.R. Gantmacher. See 
the references section. 


Remark 7.41 The last proof of Theorem 7.34 that we gave makes it possible to in- 
terpret the largest eigenvalue A of a symmetric transformation A as the maximum 
of the quadratic form (x, A(x)) on the sphere |x| = 1. Let A; be the other eigen- 
values, so that (x, A(x)) = Aix? f+eee+ Age. Then A is the greatest among the 
A;. Indeed, let us assume that the eigenvalues are numbered in descending order: 
Ay > Ad >+++> dy. Then 


Ayxp tee bAnxe SA (x7 ++ +27), 


and the maximum value of the form (x, A(x)) on the sphere |x| = 1 is equal to A; 
(it is attained at the vector with coordinates x; = 1, x2 =--- = x, = 0). This implies 
that A; =A. 

There is an analogous characteristic for the other eigenvalues 4; as well, namely 
the Courant—Fischer theorem, which we shall present without proof. Let us consider 
all possible vector subspaces L’ C L of dimension k. We restrict the quadratic form 
(x, ’(x)) to the subspace L’ and examine its values at the intersection of L’ with the 
unit sphere, that is, the set of all vectors x € L’ that satisfy |x| = 1. By the Bolzano- 
Weierstrass theorem, the restriction of the form (x, A(x)) to L’ assumes a maximum 
value 4’ at some point of the sphere, which, of course depends on the subspace L’. 
The Courant—Fischer theorem asserts that the smallest number thus obtained (as the 
subspace L’ ranges over all subspaces of dimension k) is equal to the eigenvalue 
An—k+1- 


Remark 7.42 Eigenvectors are connected with the question of finding maxima and 
minima. Let f(x1,...,%,) be a real-valued differentiable function of n real vari- 
ables. A point at which all the derivatives of the function f with respect to the 
variables (x1,...,%n), that is, the derivatives in all directions from this point, are 
equal to zero is called a critical point of the function. It is proved in real analysis 
that with some natural constraints, this condition is necessary (but not sufficient) for 
the function f to assume a maximum or minimum value at the point in question. 
Let us consider a quadratic form f(x) = (x, A(x)) on the unit sphere |x| = 1. It is 
not difficult to show that for an arbitrary point on this sphere, all points sufficiently 


254 7 Euclidean Spaces 


Fig. 7.9 An ellipsoid ae 
close to it can be written in some system of coordinates such that our function f 
can be viewed as a function of these coordinates. Then the critical points of the 


function (x, A(x)) are exactly those points of the sphere that are eigenvectors of 
the symmetric transformation A. 


Example 7.43 Let an ellipsoid be given in three-dimensional space with coordinates 
x, y, z by the equation 


2 2 2 

atytaal (7.57) 
The expression on the left-hand side of (7.57) can be written in the form w(x) = 
(x, A(x)), where 

x y Zz 
x = (x,y, 2), aa)= (3.3.5). 

Let us assume that 0 < a < b <c. Then the maximum value that the quadratic form 
w(x) takes on the sphere |x| = lis A = 1/a?. It is attained on the vectors (+1, 0, 0). 
If |y(x)| < A for |x| = 1, then for an arbitrary vector y £ 0, setting x = y/|y|, we 
obtain |W(y)| < Aly|?. For the vector y = 0, this inequality is obvious. Therefore, 
it holds in general for all y. For |W(y)| = 1, it then follows that | y|? > 1/A. This 
implies that the shortest vector y satisfying equation (7.57) is the vector (-a, 0, 0). 
The line segments beginning at the point (0, 0, 0) and ending at the points (a, 0, 0) 
are called the semiminor axes of the ellipsoid (sometimes, this same term denotes 
their length). Similarly, the smallest value that the quadratic form w(x) attains on 
the sphere |x| = 1 is equal to 1/c?. It attains this value at vectors (0,0, +1) on the 
unit sphere. Line segments corresponding to vectors (0, 0, Ec) are called semima- 
jor axes of the ellipsoid. A vector (0, +b, 0) corresponds to a critical point of the 
quadratic form y(x) that is neither a maximum nor a minimum. Such a point is 
called a minimax, that is, as it moves from this point in one direction, the func- 
tion &(x) will increase, while in moving in another direction it will decrease (see 
Fig. 7.9). The line segments corresponding to the vectors (0, +b, 0) are called the 
median semiaxes of the ellipsoid. 


Everything presented thus far in this chapter (with the exception of Sect. 7.3 
on the orientation of a real Euclidean space) can be transferred verbatim to complex 
Euclidean spaces if the inner product is defined using the positive definite Hermitian 
form g(x, y). The condition of positive definiteness means that for the associated 
quadratic Hermitian form w(x) = g(x, x), the inequality (x) > 0 is satisfied for 


7.6 Applications to Mechanics and Geometry* 255 


all x 4 0. If we denote, as before, the inner product by (x, y), the last condition can 
be written in the form (x, x) > 0 for all x £0. 

The dual transformation A*, as previously, is defined by condition (7.46). But 
now, the matrix of the transformation A* in an orthonormal basis is obtained from 
the matrix of the transformation “A not simply by taking the transpose, but by taking 
the complex conjugate of the transpose. The analogue of a symmetric transforma- 
tion is defined as a transformation A whose associated bilinear form (x, A(y)) is 
Hermitian. 

It is a fundamental fact that in quantum mechanics, one deals with complex space. 
We can formulate what was stated earlier in the following form: observed physical 
quantities correspond to Hermitian forms in infinite-dimensional complex Hilbert 
space. 

The theory of Hermitian transformations in the finite-dimensional case is con- 
structed even more simply than the theory of symmetric transformations in real 
spaces, since there is no need to prove analogues of Theorem 7.34: we know already 
that an arbitrary linear transformation of a complex vector space has an eigenvector. 
From the definition of being Hermitian, it follows that the eigenvalues of a Her- 
mitian transformation are real. The theorems proved in this section are valid for 
Hermitian forms (with the same proofs). 

In the complex case, a transformation U preserving the inner product is called 
unitary. The reasoning carried out in Sect. 7.2 shows that for a unitary transforma- 
tion U, there exists an orthonormal basis consisting of eigenvectors, and all eigen- 
values of the transformation U are complex numbers of modulus 1. 


7.6 Applications to Mechanics and Geometry* 


We shall present two examples from two different areas—mechanics and geome- 
try—in which the theorems of the previous section play a key role. Since these 
questions will be taken up in other courses, we shall allow ourselves to be brief in 
both the definitions and the proofs. 


Example 7.44 Let us consider the motion of a mechanical system in a small neigh- 
borhood of its equilibrium position. One says that such a system possesses n degrees 
of freedom if in some region, its state is determined by n so-called generalized co- 
ordinates q\, ..., Gn, which we shall consider the coordinates of a vector g in some 
coordinate system, and where we will take the origin 0 to be the equilibrium posi- 
tion of our system. The motion of the system determines the dependence of a vector 
q on time t. We shall assume that the equilibrium position under investigation is 
determined by a strict local minimum of its potential energy IT. If this value is 
equal to c, and the potential energy is a function [7(q1,..., qn) in the generalized 
coordinates (it is assumed that it does not depend on time), then this implies that 
IT(0,...,0) =c and IT(q1,...,qn) > c for all remaining values qj, ..., Gn close to 
zero. From the fact that a critical point of the function JT corresponds to the min- 
imum value, we may conclude that at the point 0, all partial derivatives d/7T/0qj 


256 7 Euclidean Spaces 


become zero. Therefore, for an expansion of the function I7(q,..., Gy) as a series 
in powers of the variables q1,...,¢@y at the point 0, the linear terms will be equal 
to zero, and we obtain the expression I7(q1,...,gdn) =C+ pare bij didi +-°°> 
where b;; are certain constants, and the ellipsis indicates terms of degree greater 
than 2. Since we are considering motions not far from the point 0, we can disregard 
those values. It is in this approximation that we shall consider this problem. That is, 
we set 
n 
T(t, +++:9n)=e+ > bij Gidj- 
i,j=l 

Since [7(q1,.--,4n) > ¢ for all values q1,...,q, not equal to zero, the quadratic 
form eet bijqiq; will be positive definite. 

Kinetic energy T is a quadratic form in so-called generalized velocities dq, /dt, 
...,dq,/dt, which are also denoted by q1,..., Gn, that is, 


n 
T= > ij Gi j> (7.58) 
i,j=l 


where aj; = aj; are functions of q (we assume that they do not depend on time f). 
Considering as we did for potential energy only those values q; close to zero, we 
may replace all the functions a;; in (7.58) by constants a;;(0), which is what we 
shall now assume. Kinetic energy is always positive except in the case that all g; are 
equal to 0, and therefore, the quadratic form (7.58) is positive definite. 

Motion in a broad class of mechanical systems (so-called natural systems) is 
described by a rather complex system of differential equations—second-order La- 
grange equations: 


T T TI 
“(=) d — a , t=l,...,n. (7.59) 
dt \ 0qi qi Ogi 


Application of Theorem 7.39 makes it possible to reduce these equations in the 
given situation to much simpler ones. To this end, let us find a coordinate system 
in which the quadratic form ) 7 ;—; 4ij-xix; can be brought into the form )77_; a 
and the quadratic form 7} ;_; bijxix; into the form )7}_, Aix}. Then in this case, 
the form ae pat bjjx;xj 18 positive definite, which implies that all A; are positive. 
In this system of coordinates (we shall again denote them by q1, ..., gn), the system 
of equations (7.59) is decomposed into the independent equations 


d*qi 
dt 


=—Aidi, i=l,...,n, (7.60) 


which have the solutions g; = c; cos./A;t + dj sin./A;t, where c; and d; are arbi- 
trary constants. This shows that “small oscillations” are periodic in each coordinate 
qi. Since they are bounded, it follows that our equilibrium position 0 is stable. If 
we were to examine the state of equilibrium at a point that was a critical point of 


7.6 Applications to Mechanics and Geometry* 257 


potential energy /7 but not a strict minimum, then in the equations (7.60) we would 
not be able to guarantee that all the A; were positive. Then for those i for which 
di < 0, we would obtain the solutions g; = cj cosh./—Ajt + d; sinh./—Aj;t, which 
can grow without bound with the growth of t. Just as for A; = 0, we would obtain 
an unbounded solution gj = cj + dit. 

Strictly speaking, we have done only the following altogether: we have replaced 
the given conditions of our problem with conditions close to them, with the result 
that the problem became much simpler. Such a procedure is usual in the theory of 
differential equations, where it is proved that solutions to a simplified system of 
equations are in a certain sense similar to the solutions of the initial system. And 
moreover, the degree of this deviation can be estimated as a function of the values 
of the terms that we have ignored. This estimation takes place in a finite interval of 
time whose length also depends on the value of the ignored terms. This justifies the 
simplifications that we have made. 

A beautiful example, which played an important role historically, is given by 
lateral oscillations of a string of beads.* 

Suppose we have a weightless and ideally flexible thread fixed at the ends. On it 
are securely fastened n beads with masses m1, ...,™,, and suppose they divide the 
thread into segments of lengths Jo, 11, ...,/,. We shall assume that in its initial state, 
the thread lies along the x-axis, and we shall denote by yj, ..., yy, the displacements 
of the beads along the y-axis. Then the kinetic energy of this system has the form 


1 n Ee 
T= 5 midi. 


i=1 


Assuming the tension of the thread to be constant (as we may because the displace- 
ments are small) and equal to o, we obtain for the potential energy the expression 
II = 0 Al, where Al = )~;_, Al; is the change in length of the entire thread, and 
Al; is the change in length of the portion of the thread corresponding to /;. Then we 
know the Al; in terms of the /;: 


A= J? +(Qi41-y—-h, i=0,...,n, 


where yo = yn+1 = 0. Expanding this expression as a sum in yj+1 — yj, we obtain 
quadratic terms )~"_9 3 i+! — y;)”, and we may set 
U 


o 1 
m=~)> 7, itl — yi), Yount =0. 
. L 


4This example is taken from Gantmacher and Krein’s book Oscillation Matrices and Kernels and 
Small Vibrations of Mechanical Systems, Moscow 1950, English translation, AMS Chelsea Pub- 
lishing, 2002. 


258 7 Euclidean Spaces 


Thus in this case, the problem is reduced to simultaneously expressing two quadratic 


forms in the variables y;,..., y, as sums of squares: 
1 n a o n 1 r 
Pap dims T= 5 DOH > Yo=Yy+1=09. 


But if the masses of all the beads are equal and they divide the thread into equal 
segments, that is, mj =m and/; =//(n+1),i=1,...,n, then all the formulas can 
be written in a more explicit form. In this case, we are speaking about the simulta- 
neous representation as the sum of squares of two forms: 


a o(n+1) n n 
raByoR m= 22EP (SS), mena 
i= i= I= 


Therefore, we must use an orthogonal transformation (preserving the form )>/_, y?) 
to express as a sum of squares the form }*;"_9 yiyi+1 with matrix 


0 1 0O 0 0 
1 0 1 0 0 
1 
ro: 0 1 O 0 
2 
0 0 *. 1 0 1 
0 O -. O 1 0 
It would have been possible to take the standard route: find the eigenvalues 
A1,--+-,An as roots of the determinant |A — ¢£| and eigenvectors y from the system 
of equations 
Ay=h,y, (7.61) 
where A = A; and y is the column of unknowns y),..., y,. But it is simpler to 


use equations (7.61) directly. They give a system of n equations in the unknowns 
Yl ee | Yn: 


y2 = 2dAy1, yity3 = 2Ay2, S35 
Yn—2 + Yn = 2AYpn-1, Yn—1 = 2AYn, 
which can be written in the form 
Ve—-1 + Yez1 = 2Ayg, k=1,...,N, (7.62) 


where we set yo = yn41 = 0. The system of equations (7.62) is called a recurrence 
relation, whereby each value yx+1 is expressed in terms of the two preceding values: 
yx and yz—1. Thus if we know two adjacent values, then we can use relationship 


7.6 Applications to Mechanics and Geometry* 259 


(7.62) to construct all the yz. The condition yo = y,+; = 0 is called a boundary 
condition. 

Let us note that for A = +1, the equation (7.62) with boundary condition yo = 
Yn+1 = O has only the null solution: yo = --- = yy41 = 0. Indeed, for A = 1, we 
obtain 


y2 = 2y1, y3 =3y1, ian Yn = NY, Yat = (n+ 1)y1, 


from which by y,41 =0 it follows that y; = 0, and all y; are equal to 0. Similarly, 
for A = —1, we obtain 


y2 = —2y, y3=3y1, ya = —4y1, wee 
Ya =(-1)" ny, yng = (HD"(0 + Dy, 


from which by y,+ 1 = 0 it follows as well that y; = 0, and again all the y, are equal 
to zero. Thus for A = +1, the system of equations (7.61) has as its only solution 
the vector y = 0, which by definition, cannot be an eigenvector. In other words, this 
implies that the numbers +1 are not eigenvalues of the matrix A. 

There is a lovely formula for solving equation (7.62) with boundary condition 
YO = yn+1 = 0. Let us denote by @ and # the roots of the quadratic equation 
z? — 24z + 1 =0. By the above reasoning, A 4 +1, and therefore, the numbers 
a and f are distinct and cannot equal +1. Direct substitution shows that then for 
arbitrary A and B, the sequence y, = Aa* + BB* satisfies the relationship (7.62). 
The coefficients A and B taken to satisfy yo = 0, y; are given. The following yx, as 
we have seen, are determined by the relationship (7.62), and this implies that again 
they are given by our formula. The conditions yo = 0, y; fixed give B = —A and 
A(a@ — B) = yj, whence A = y;/(a@ — B). Thus we obtain the expression 


— Si k _ gk 
= 5 gle B*). (7.63) 


We now use the condition y,41; = 0, which gives ottl — eo", Moreover, since 


a and £ are roots of the polynomial z* — 24z+ 1, we have aB = 1, whence B =a7!, 


which implies that a?+)) = 1. From this (taking into account that a 4 +1), we 


obtain 
a = cos Hd +isin Td F 
n+l n+l 


where i is the imaginary unit, and the number j assumes the values 1, ...,. Again 
using the equation z* — 24z + 1 = 0, whose roots are w and £, we obtain n distinct 


values for A: 
Tj 
Aj= , =1,...,n, 
j cos( 24) J n 


since j =n+2,...,2n + 1 give the same values 1 ;. These are precisely the eigen- 
values of the matrix A. For the eigenvector y; of the associated eigenvalue 4, we 


260 7 Euclidean Spaces 


obtain by formula (7.63) its coordinates Vijo-++s Ynj in the form 
Vij = SI Ruse k=1 
; = Sin ’ SS Pa iiag TEs 
a n+l 


These formulas were derived by d’Alembert and Daniel Bernoulli. Passing to the 
limit as n — oo, Lagrange derived from these the law of vibrations of a uniform 
string. 


Example 7.45 Let us consider in an n-dimensional real Euclidean space L the subset 
X given by the equation 


F(x1,...,Xn) =0 (7.64) 


in some coordinate system. Such a subset X is called a hypersurface and consists of 
all vectors x = (x1,...,%X,) of the Euclidean space L whose coordinates satisfy the 
equation? (7.64). Using the change-of-coordinates formula (3.36), we see that the 
property of the subset X C L being a hypersurface does not depend on the choice 
of coordinates, that is, on the choice of the basis of L. Then if we assume that the 
beginning of every vector is located at a single fixed point, then every vector x = 
(x1,...,Xn) can be identified with its endpoint, a point of the given space. In order 
to conform to more customary terminology, as we continue with this example, we 
shall call the vectors x of which the hypersurface X consists its points. 

We shall assume that F'(0) = 0 and that the function F(x1,...,x,) is differen- 
tiable in each of its arguments as many times as necessary. It is easily verified that 
this condition also does not depend on the choice of basis. Let us assume in addi- 
tion that 0 is not a critical point of the hypersurface X, that is, that not all partial 
derivatives 0 F'(0)/0x; are equal to zero. In other words, if we introduce the vector 
grad F = (0F /dx,,...,0F /0x,), called the gradient of the function F, then this 
implies that grad F(0) 4 0. 

We shall be interested in /ocal properties of the hypersurface X, that is, prop- 
erties associated with points close to 0. With the assumptions that we have made, 
the implicit function theorem, known from analysis, shows that near 0, the coordi- 
nates x,,...,%X, of each point of the hypersurface X can be represented as a func- 
tion of nm — | arguments w1,...,%,—1, and furthermore, for each point, the values 
Uj,...,Un—1 are uniquely determined. It is possible to choose as u1,..., 4n—1 Some 
n — | of the coordinates x|,..., ny, after determining the remaining coordinate x, 
from equation (7.64), for which must be satisfied only the condition 5 0) + 0 for 
the given k, which holds because of the assumption grad F(0) 4 0. The functions 
that determine the dependence of the coordinates x;,..., xX, of a point of the hyper- 
plane X on the arguments u1,...,U,—1 are differentiable at all arguments as many 
times as the original function F(x1,...,%n). 


5The more customary point of view, when the hypersurface (for example, a curve or surface) con- 
sists of points, requires the consideration of an n-dimensional space consisting of points (otherwise 
affine space), which will be introduced in the following chapter. 


7.6 Applications to Mechanics and Geometry* 261 


The hyperplane defined by the equation 


is called the tangent space or tangent hyperplane to the hypersurface X at the point 
0 and is denoted by 79X. In the case that the basis of the Euclidean space L is 
orthonormal, this equation can also be written in the form (grad F(0), x) =0. Asa 
subspace of the Euclidean space L, the tangent space 7X is also a Euclidean space. 

The set of vectors depending on the parameter ¢ taking values on some interval 
of the real line, that is, x(t) = (x1(f),..., Xn (t)), 1s called a smooth curve if all 
functions x;(t) are differentiable a sufficient number of times and if for every value 
of the parameter ¢, not all the derivatives dx;/dt are equal to zero. In analogy to 
what was said above about hypersurfaces, we may visualize the curve as consisting 
of points A(t), where each A(t) is the endpoint of some vector x(t), while all the 
vectors x(t) begin at a certain fixed point O. In what follows, we shall refer to the 
vectors x that constitute the curve as its points. 

We say that a curve y passes through the point XQ if x (to) = xo for some value 
of the parameter fo. It is clear that here we may always assume that fg = 0. Indeed, 
let us consider a different curve ¥(t) = (1 (ft), ...,Xn(t)), where the functions <; (t) 
are equal to x;(t + fg). This can also be written in the form ¥(t) = x(t), where we 
have introduced a new parameter Tt related to the old one by t = t — f. 

Generally speaking, for a curve we may make an arbitrary change of parameter 
by the formula t = w(t), where the function y defines a continuously differentiable 
bijective mapping of one interval to another. Under such a change, a curve, consid- 
ered as a set of points (or vectors), will remain the same. From this it follows that one 
and the same curve can be written in a variety of ways using various parameters.° 


We now introduce the vector ae = (4 eae din ). Suppose the curve y passes 


through the point 0 for t = 0. Then the vector p = i (0) is called a tangent vector 
to the curve y at the point 0. It depends, of course, on the choice of parameter ¢ 
defining the curve. Under a change of parameter t = y(t), we have 


dx dx dt dx _, 
aaa ae 1.65 
aa ae oe) 


and the tangent vector p is multiplied by a constant equal to the value of the deriva- 
tive /’(0). Using this fact, it is possible to arrange things so that | (t)| = 1 for all t 
close to 0. Such a parameter is said to be natural. The condition that the curve x(t) 
belong to the hyperplane (7.64) gives the equality F(x(t)) = 0, which is satisfied 
for all t. Differentiating this relationship with respect to t, we obtain that the vector 
p lies in the space 79X. And conversely, an arbitrary vector contained in Tg X can 


For example, the circle of radius 1 with center at the origin with Cartesian coordinates x, y can be 
defined not only by the formula x = cost, y = sinf, but also by the formula x = cost, y = — sint 
(with the replacement ¢t = —T), or by the formula x = sint, y =cost (replacement t = 7 —T). 


262 7 Euclidean Spaces 


be represented in the form & (0) for some curve x(t). This curve, of course, is not 
uniquely determined. Curves whose tangent vectors p are proportional are said to 
be tangent at the point 0. 

Let us denote by 7 a unit vector orthogonal to the tangent space 7) X. There are 
two such vectors, m and —n, and we shall choose one of them. For example, we may 
set 


dF 
n= =“). (7.66) 
| grad F | 
We define the vector & “sas & £(@ 7) and set 
Q= (3 We (0), n). (7.67) 


Proposition 7.46 The value Q depends only on the vector p; namely, it is a 
quadratic form in its coordinates. 


Proof It suffices to verify this assertion by substituting in (7.67) for the vector n, 
any vector proportional to it, for example, grad F' (0). Since by assumption, the curve 
x(t) is contained in the hyperplane (7.64), it follows that F(x; (t),...,x,(f)) =0 
Differentiating this equality twice with respect to t, we obtain 
n n 2 n 2 
OF dx; O-F dx; dx; OF d*x; 
; >»; ge Doers 


Lax; dt’ Ox; Ox; dt dt ax; dt? 
i=1 i, j=l ‘ =1 


Setting here t = 0, we see that 


7 2F 


ax OF 
(5. gr F) = - 2s ayy OPIPH: 


where p = (p1,.--, Pn). This proves the assertion. 


The form Q(p) is called the second quadratic form of the hypersurface. The 
form (p”) is called the first quadratic form when TyX is taken as a subspace of a 
Euclidean space L. We observe that the second quadratic form requires the selec- 
tion of one of two unit vectors (n or —m) orthogonal to TyX. This is frequently 
interpreted as the selection of one side of the hypersurface in a neighborhood of the 
point 0. 

The first and second quadratic forms give us the possibility to obtain an expres- 
sion for the curvature of certain curves x(t) lying in the hypersurface X. Let us 
suppose that a curve is the intersection of a plane L’ containing the point 0 and the 
hypersurface X (even if only in an arbitrarily small neighborhood of the point 0). 
Such a curve is called a plane section of the hypersurface. If we define the curve 
x(t) in such a way that ¢ is a natural parameter, then its curvature at the point 0 is 


7.6 Applications to Mechanics and Geometry* 263 


the number 
[eo 
We assume that k 4 0 and set 
1 d’x 
=-—-—~(0). 
k de | 


The vector m has length 1 by definition. It is said to be normal to the curve x(t) at 
the point 0. If the curve x(t) is a plane section of the hypersurface, then x(t) lies in 
the plane L’ (for all sufficiently small t), and consequently, the vector 

dx _ xt+h)— x(t) 


= lim 
dt  h-0 h 


also lies in the plane L’. Therefore, this holds as well for the vector d?x /dt*, which 
implies that it holds as well for the normal m. If the curve y is defined in terms of 


the natural parameter rf, then 
. _ (dx dx\ | 
~ \dt? dt) 


Differentiating this equality with respect to t, we obtain that the vectors d*x /dt? 
and dx/dt are orthogonal. Hence the normal m to the curve y is orthogonal to an 
arbitrary tangent vector (for arbitrary definition of the curve y in the form x(t) with 
natural parameter ¢), and the vector m is defined uniquely up to sign. It is obvious 
that L’! = (m, p), where p is an arbitrary tangent vector. 

By definition (7.67) of the second quadratic form Q and taking into account the 
equality |m|= |n| = 1, we obtain the expression 


dx 
dt 


O(p) = (km, n) =k(m,n)=kcosg, (7.68) 


where ¢ is the angle between the vectors m and n. The expression k cos @ is denoted 
by k and is called the normal curvature of the hypersurface X in the direction p. 
We recall that here nm denotes the chosen unit vector orthogonal to the tangent space 
Ty X, and m is the normal to the curve to which the vector p is tangent. An analo- 
gous formula for an arbitrary parametric definition of the curve x(t) (where ¢ is not 
necessarily a natural parameter) also uses the first quadratic form. Namely, if t is 
another parameter, while f is a natural parameter, then by formula (7.65), now in- 
stead of the vector p, we obtain p’ = pw’(0). Since Q is a quadratic form, it follows 
that O(pw’(0)) = wv’ (0) O(p), and instead of formula (7.68), we now obtain 


=kcos@. (7.69) 


264 7 Euclidean Spaces 


Here the first quadratic form (p7) is already involved as well as the second quadratic 
form Q(p), but now (7.69), in contrast to (7.68), holds for an arbitrary choice of 
parameter ¢ on the curve y. 

The point of the term normal curvature given above is the following. The section 
of the hypersurface X by the plane L’ is said to be normal if n € LU’. The vector n 
defined by formula (7.66) is orthogonal to the tangent plane 7yX. But in the plane L’ 
there is also the vector p tangent to the curve y, and the normal vector m orthogonal 
to it. Thus in the case of a normal section n = +m, this means that in formula (7.68), 
the angle @ is equal to 0 or 2. Conversely, from the equality | cos g| = 1, it follows 
that n € L’. Thus in the case of a normal section, the normal curvature k differs from 
k only by the factor +1 and is defined by the relationship 


O(p) 
lpi? 


k= 


Since L’ = (m, p), it follows that all normal sections correspond to straight lines in 
the plane L’. For each line, there exists a unique normal section containing this line. 
In other words, we “rotate” the plane L’ about the vector m, considering all obtained 
planes (m, p), where p is a vector in the tangent hyperplane 7) X. Thus all normal 
sections of the hypersurface X are obtained. 

We shall now employ Theorem 7.38. In our case, it gives an orthonormal basis 
€,-.-,@n—1 in the tangent hyperplane Ty X (viewed as a subspace of the Euclidean 
space L) in which the quadratic form Q(p) is brought into canonical form. In other 
words, for the vector p = u,e; +--+: + Up—1en—1, the second quadratic form takes 
the form Q(p) = Aut feet Agios Since the basis e1,..., @€;—1 1s orthonor- 
mal, we have in this case 


u; _ (p,éi) 
Pil Pil 


= COS Gj, (7.70) 


where g; is the angle between the vectors p and e;. From this we obtain for the 
normal curvature k of the normal section y, the formula 


n-1 _\2 n—1 
7 2) =y(*) = So Ai cos? gi, (7.71) 


2 
IPI OAL = 


where p is an arbitrary tangent vector to the curve y at the point 0. Relationships 
(7.70) and (7.71) are called Euler’s formula. The numbers A; are called principal 
curvatures of the hypersurface X at the point 0. 

In the case n = 3, the hypersurface (7.64) is an ordinary surface and has two prin- 
cipal curvatures 4; and A. Taking into account the fact that cos? git cos? ¢g2 = 1, 
Euler’s formula takes the form 


k= dy cos” gi +2 cos” g2 = (At — 2) cos” gi +Az2. (7.72) 


Suppose A; > A2. Then from (7.72), it is clear that the normal curvature Kk as- 
sumes a maximum (equal to 41) for cos? gy, = | and a minimum (equal to 42) for 


7.7 Pseudo-Euclidean Spaces 265 


“| 


(a) (b) 


Fig. 7.10 Elliptic (a) and hyperbolic (b) points 


cos” y, = 0. This assertion is called the extremal property of the principal curva- 
tures of the surface. If A; and Az have the same sign (A1A2 > 0), then as can be 
seen from (7.72), an arbitrary normal section of a surface at a given point 0 has 
its curvature of the same sign, and therefore, all normal sections have convexity in 
the same direction, and near the point 0, the surface lies on one side of its tangent 
plane; see Fig. 7.10(a). Such points are called elliptic. If A, and Az have differ- 
ent signs (A;A2 < 0), then as can be seen from formula (7.72), there exist normal 
sections with opposite directions of convexity, and at points near 0, the surface is lo- 
cated on different sides of its tangent plane; see Fig. 7.10(b). Such points are called 
hyperbolic.’ 

From all this discussion, it is evident that the product of principal curvatures 
kK = A,A2 characterizes some important properties of a surface (called “internal ge- 
ometric properties” of the surface). This product is called the Gaussian or total 
curvature of the surface. 


7.7 Pseudo-Euclidean Spaces 


Many of the theorems proved in the previous sections of this chapter remain valid 
if in the definition of Euclidean space we forgo the requirement of positive definite- 
ness of the quadratic form (x*) or replace it with something weaker. Without this 
condition, the inner product (x, y) does not differ at all from an arbitrary symmetric 
bilinear form. As Theorem 6.6 shows, it is uniquely defined by the quadratic form 
(x?). 

We thus obtain a theory that fully coincides with the theory of quadratic 
forms that we presented in Chap. 6. The fundamental theorem (on bringing a 
quadratic form into canonical form) consists in the existence of an orthonormal 
basis €1,...,@n, that is, a basis for which (e;,e;) = 0 for all i  j. Then for the 
vector xj@, +:---+Xy@,, the quadratic form (x7) is equal to Maat free ine. 


7Examples of surfaces consisting entirely of elliptic points are ellipsoids, hyperboloids of two 
sheets, and elliptic paraboloids, while surfaces consisting entirely of hyperbolic points include 
hyperboloids of one sheet and hyperbolic paraboloids. 


266 7 Euclidean Spaces 


Moreover, this is true for vector spaces and bilinear forms over an arbitrary field K 
of characteristic different from 2. The concept of an isomorphism of spaces makes 
sense also in this case; as previously, it is necessary to require that the scalar product 
(x, y) be preserved. 

The theory of such spaces (defined up to isomorphism) with a bilinear or 
quadratic form is of great interest (for example, in the case K = Q, the field of 
rational numbers). But here we are interested in real spaces. In this case, formula 
(6.28) and Theorem 6.17 (law of inertia) show that up to isomorphism, a space is 
uniquely defined by its rank and the index of inertia of the associated quadratic form. 

We shall further restrict attention to an examination of real vector spaces with a 
nonsingular symmetric bilinear form (x, y). Let us recall that the nonsingularity of 
a bilinear form implies that its rank (that is, the rank of its matrix in an arbitrary 
basis of the space) is equal to dimL. In other words, this means that its radical is 
equal to (0); that is, if the vector x is such that (x, y) = 0 for all vectors y € L, then 
x = 0 (see Sect. 6.2). For a Euclidean space, this condition follows automatically 
from property (4) of the definition (it suffices to set there y = x). 

Formula (6.28) shows that with these conditions, there exists a basis e1,..., €n 
of the space L for which 


(e;,e;))=0 fori¥ j, (e?)=+1. 


Such a basis is called, as it was previously, orthonormal. In it, the form (x?) can be 
written in the form 


2 2 2 2 2 
(x) Sap to tg aS 


and the number s is called the index of inertia of both the quadratic form (x*) and 
the pseudo-Euclidean space L. 

A new difficulty appears that was not present for Euclidean spaces if the quadratic 
form (x*) is neither positive nor negative definite, that is, if its index of inertia s is 
positive but less than n. In this case, the restriction of the bilinear form (x, y) to the 
subspace L’ C L can turn out to be singular, even if the original bilinear form (x, y) 
in L was nonsingular. For example, it is clear that in L, there exists a vector x £40 
for which (x?) = 0), and then the restriction of (x, y) to a one-dimensional subspace 
(x) is singular (identically equal to zero). 

Thus let us consider a vector space L with a nonsingular symmetric bilinear form 
(x, y) defined on it. In this case, we shall use many concepts and much of the nota- 
tion used for Euclidean spaces earlier. Hence, vectors x and y are called orthogonal 
if (x, y) =0. Subspaces L, and Lz are called orthogonal if (x, y) = 0 for all vectors 
x €L; and y € Ly, and we express this by writing Ly | Lz. The orthogonal comple- 
ment of the subspace L’ C L with respect to the bilinear form (x, y) is denoted by 
(L’)+. However, there is an important difference from the case of Euclidean spaces, 
in connection with which it will be useful to give the following definition. 


Definition 7.47 A subspace L’ C L is said to be nondegenerate if the bilinear form 
obtained by restricting the form (x, y) to L’ is nonsingular. In the contrary case, L’ 
is said to be degenerate. 


7.7 Pseudo-Euclidean Spaces 267 


By Theorem 6.9, in the case of a nondegenerate subspace L’ we have the orthog- 
onal decomposition 


L=Uo@(l). (7.73) 


In the case of a Euclidean space, as we have seen, every subspace L’ is nondegen- 
erate, and the decomposition (7.73) holds without any additional conditions. As the 
following example will show, in a pseudo-Euclidean space, the condition of nonde- 
generacy of a subspace L’ for the decomposition (7.73) is in fact essential. 


Example 7.48 Let us consider a three-dimensional space L with a symmetric bilin- 
ear form defined in some chosen basis by the formula 


(x, y) =x y1 + X22 — X3,Y3, 


where the x; are the coordinates of the vector x, and the y; are the coordinates 
of the vector y. Let L’ = (e), where the vector e has coordinates (0, 1,1). Then 
as is easily verified, (e, e) = 0, and therefore, the restriction of the form (x, y) to 
L’ is identically equal to zero. This implies that the subspace L’ is degenerate. Its 
orthogonal complement (L’)+ is two-dimensional and consists of all vectors z € L 
with coordinates (z,, z2, 73) for which z2 = z3. Consequently, L’ Cc (L’)+, and the 
intersection L/N (L’)+ =L’ contains nonnull vectors. This implies that the sum L’ + 
(L’)+ is not a direct sum. Furthermore, it is obvious that L’ + (L’)+ £L. 


It follows from the nonsingularity of a bilinear form (x, y) that the determinant 
of its matrix (in an arbitrary basis) is different from zero. If this matrix is written in 


the basis e;,..., @n, then its determinant is equal to 
(e1,€1) (€1,€2) +++ (€1,en) 
(€2,€1) (€2,€2) +++ (€2,€n) 
; : : , (7.74) 
(€@n,€1) (€n,€2) +++ (Cn, en) 


and just as in the case of a Euclidean space, we shall call this its Gram determi- 
nant of the basis e1,..., @n. Of course, this determinant depends on the choice of 
basis, but its sign does not depend on the basis. Indeed, if A and A’ are matrices 
of our bilinear form in two different bases, then they are related by the equality 
A’ =C*AC, where C is a nonsingular transition matrix, from which it follows that 
|A’| =|A|-|C|?. Thus the sign of the Gram determinant is the same for all bases. 
As noted above, for a nondegenerate subspace L’ C L, we have the decomposition 
(7.73), which yields the equality 
dimL = dimL’ + dim(L’)”. (7.75) 
But equality (7.75) holds as well for every subspace L’ C L, although as we saw in 
Example 7.48, the decomposition (7.73) may already not hold in the general case. 


268 7 Euclidean Spaces 


Indeed, by Theorem 6.3, we can write an arbitrary bilinear form (x, y) in the 
space L in the form (x, y) = (x, A(y)), where A: L— L* is some linear transfor- 
mation. From the nonsingularity of the bilinear form (x, y) follows the nonsingular- 
ity of the transformation A. In other words, the transformation A is an isomorphism, 
that is, its kernel is equal to (0), and in particular, for an arbitrary subspace L’ CL, 
we have the equality dim A(L’) = dimL’. On the other hand, we can write the or- 
thogonal complement (L’ )+ in the form (A(L’))%, using the notion of the annihilator 
introduced in Sect. 3.7. On the basis of what we have said above and formula (3.54) 
for the annihilator, we have the relationship 


dim(4(L’))“ = dimL — dim A(L’) = dimL — dimL’, 


that is, dim(L’)+ = dimL — dimL’. We note that this argument holds for vector 
spaces L defined not only over the real numbers, but over any field. 

The spaces that we have examined are defined (up to isomorphism) by the index 
of inertia s, which can take values from 0 to n. By what we have said above, the sign 
of the Gram determinant of an arbitrary basis is equal to (—1)”~*. It is obvious that 
if we replace the inner product (x, y) in the space L by —(x, y), we shall preserve all 
of its essential properties, but the index of inertia s will be replaced by n — s, whence 
in what follows, we shall assume that n/2 <s <n. The case s =n corresponds 
to a Euclidean space. There exists, however, a phenomenon whose explanation is 
at present not completely clear; the most interesting questions in mathematics and 
physics were until now connected with two types of spaces: those in which the index 
of inertia s is equal to n and those for which s = n — 1. The theory of Euclidean 
spaces (s =n) has been up till now the topic of this chapter. In the remaining part, 
we shall consider the other case: s = n — 1. In the sequel, we shall call such spaces 
pseudo-Euclidean spaces (although sometimes, this term is used when (x, y) is an 
arbitrary nonsingular symmetric bilinear form neither positive nor negative definite, 
that is, with index of inertia s ~ 0, 7). 

Thus a pseudo-Euclidean space of dimension n is a vector space L equipped with 
a symmetric bilinear form (x, y) such that in some basis e),..., @,, the quadratic 
form (x”) takes the form 


ae eee x2, — 92. (7.76) 


As in the case of a Euclidean space, we shall, as we did previously, call such bases 
orthonormal. 

The best-known application of pseudo-Euclidean spaces is related to the special 
theory of relativity. According to an idea put forward by Minkowski, in this theory, 
one considers a four-dimensional space whose vectors are called space-time events 
(we mentioned this earlier, on p. 86). They have coordinates (x, y, z,t), and the 
space is equipped with a quadratic form x? + y? + 2? — 1? (here the speed of light 
is assumed to be 1). The pseudo-Euclidean space thus obtained is called Minkowski 
space. By analogy with the physical sense of these concepts in Minkowski space, in 
an arbitrary pseudo-Euclidean space, a vector x is said to be spacelike if (x7) > 0, 


7.7 Pseudo-Euclidean Spaces 269 


Fig. 7.11 A pseudo- 
Euclidean plane 


while such a vector is said to be timelike if (x?) < 0, and lightlike, or isotropic, if 
(x?) =0.8 


Example 7.49 Let us consider the simplest case of a pseudo-Euclidean space L with 
dim L = 2 and index of inertia s = 1. By the general theory, in this space there exists 
an orthonormal basis, in this case the basis e1, e2, for which 


(et)=1, (e3)=-1, — (e1, e2) =0, (7.77) 
and the scalar square of the vector x = x,e; + x2e2 is equal to (x?) = x7 _ ECE 
However, it is easier to write the formulas connected with the space L in the basis 
consisting of lightlike vectors f,, f>, after setting 


er +en 


fi= 2 ’ 


| a (7.78) 
2 

Then sD = (f3) = 0, (f,, f.) = 7 and the scalar square of the vector x = 

x1 f +x2f is equal to (x?) = x1x2. The lightlike vectors are located on the co- 

ordinate axes; see Fig. 7.11. The timelike vectors comprise the second and fourth 

quadrants, and the spacelike vectors make up the first and third quadrants. 


Definition 7.50 The set V C L consisting of all lightlike vectors of a pseudo- 
Euclidean space is called the light cone (or isotropic cone). 


That we call the set V a cone suggests that if it contains some vector e, then it 
contains the entire straight line (e), which follows at once from the definition. The 
set of timelike vectors is called the interior of the cone V, while the set of spacelike 
vectors makes up its exterior. In the space from Example 7.49, the light cone V is 
the union of two straight lines (f) and (f,). A more visual representation of the 
light cone is given by the following example. 


8We remark that this terminology differs from what is generally used: Our “spacelike” vectors are 
usually called “timelike,” and conversely. The difference is explained by the condition s =n — | 
that we have assumed. In the conventional definition of Minkowski space, one usually considers 
the quadratic form —x? — y? — z* + #’, with index of inertia s = 1, and we need to multiply it by 


—1 in order that the condition s > n/2 be satisfied. 


270 7 Euclidean Spaces 


Fig. 7.12. The light cone 


Example 7.51 We consider the pseudo-Euclidean space L with dim L = 3 and index 
of inertia s = 2. With the selection of an orthonormal basis e1, e2, e3 such that 


(et) = (e3) = 1, (e3)=—1, (e;,e;)=0 foralliF j, 


the light cone V is defined by the equation bo + x - a = 0. This is an ordinary 
right circular cone in three-dimensional space, familiar from a course in analytic 
geometry; see Fig. 7.12. 


We now return to the general case of a pseudo-Euclidean space L of dimension n 
and consider the light cone V in L in greater detail. First of all, let us verify that it is 
“completely circular.’ By this we mean the following. 


Lemma 7.52 Although the cone V contains along with every vector x the entire 
line (x), it contains no two-dimensional subspace. 


Proof Let us assume that V contains a two-dimensional subspace (x, y). We choose 
a vector e € L such that (e~) = —1. Then the line (e) is a nondegenerate subspace of 
L, and we can use the decomposition (7.73): 


L=(e) @ (e)t. (7.79) 


From the law of inertia it follows that (e)+ is a Euclidean space. Let us apply the 
decomposition (7.79) to our vectors x, y € V. We obtain 


x=ae+u, y=fe+v, (7.80) 


where wu and v are vectors in the Euclidean space (e)+, while aw and B are some 
scalars. 

The conditions (x?) = 0 and (y?) = 0 can be written as a” = (u”) and Bp =(v’). 
Using the same reasoning for the vector x + y= (a+ fB)e+u-+ v, which by the 
assumption (x, y) C V is also contained in V, we obtain the equality 


(a+ pr =(ut+v,u+v)= (u’) +2(u,v) + (v°) =a’ + 2(u, v) + B?. 


Canceling the terms a” and f* on the left- and right-hand sides of the equality, we 
obtain that af = (u, v), that is, (u, v= a? B? = (u”) - (v”). Thus for the vectors 


7.7 Pseudo-Euclidean Spaces 271 


u and v in the Euclidean space (e)+, the Cauchy-Schwarz inequality reduces to 
an equality, from which it follows that u and v are proportional (see p. 218). Let 
v = Au. Then the vector y — Ax = (6 — Aa)e is also lightlike. Since (e*) = —1, it 
follows that 6 = Aq. But then from the relationship (7.80), it follows that y = Ax, 
and this contradicts the assumption dim(x, y) = 2. 


Let us select an arbitrary timelike vector e € L. Then in the orthogonal comple- 
ment (e)+ of the line (e), the bilinear form (x, y) determines a positive definite 
quadratic form. This implies that (e)t M V = (0), and the hyperplane (e)+ divides 
the set V \ 0 into two parts, V; and V_, consisting of vectors x € V such that in 
each part, the condition (e, x) > 0 or (e, x) < 0 is respectively satisfied. We shall 
call these sets V_ and V_ poles of the light cone V. In Fig. 7.12, the plane (e1, e2) 
divides V into “upper” and “lower” poles V; and V_ for the vector e = e3. 

The partition V \ 0 = V; U V_ that we have constructed rested on the choice of 
some timelike vector e, and ostensibly, it must depend on it (for example, a change 
in the vector e to —e interchanges the poles V; and V_). We shall now show that 
the decomposition V \ 0 = V, U V_, without taking into account how we designate 
each pole, does not depend on the choice of vector e, that is, it is a property of 
the pseudo-Euclidean space itself. To do so, we shall require the following, almost 
obvious, assertion. 


Lemma 7.53 Let LU’ be a subspace of the pseudo-Euclidean space L of dimension 
dimL’ > 2. Then the following statements are equivalent: 


(1) L’ is a pseudo-Euclidean space. 
contains a timelike vector. 
(2) U j imelik 
(3) L’ contains two linearly independent lightlike vectors. 


Proof If U’ is a pseudo-Euclidean space, then statements (2) and (3) obviously fol- 
low from the definition of a pseudo-Euclidean space. 

Let us show that statement (2) implies statement (1). Suppose L’ contains a time- 
like vector e. That is, (e”) < 0, whence the subspace (e) is nondegenerate, and 
therefore, we have the decomposition (7.79), and moreover, as follows from the 
law of inertia, the subspace (e)+ is a Euclidean space. If the subspace L’ were de- 
generate, then there would exist a nonnull vector uw € L’ such that (u, x) = 0 for 
all x € L’, and in particular, for vectors e and u. The condition (u, e) = 0 implies 
that the vector wu is contained in (e)+, while the condition (u, u) = 0 implies that 
the vector wu is lightlike. But this is impossible, since the subspace (e)+ is a Eu- 
clidean space and cannot contain lightlike vectors. This contradiction shows that the 
subspace L’ is nondegenerate, and therefore, it exhibits the decomposition (7.73). 
Taking into account the law of inertia, it follows from this that the subspace L’ is a 
pseudo-Euclidean space. 

Let us show that statement (3) implies statement (1). Suppose the subspace L’ 
contains linearly independent lightlike vectors f, and f,. We shall show that the 
plane IT = (f, f2) contains a timelike vector e. Then obviously, e is contained 


272 7 Euclidean Spaces 


Fig. 7.13. The plane IT ina 


three-dimensional CAL GD 
pseudo-Euclidean space / KY 


(a) (b) (c) 


in L’, and by what was proved above, the subspace L’ is a pseudo-Euclidean space. 
Every vector e € IT can be represented in the form e =a f, + 6 fy. From this, we 
obtain (e*) = 2aB(f,, f>). We note that (f,, f>) 4 0, since in the contrary case, 
for each vector e € IT, the equality (e~) = 0 would be satisfied, implying that the 
plane /7 lies completely in the light cone V, which contradicts Lemma 7.52. Thus 
(f 1, f2) #0, and choosing coordinates w and 6 such that the sign of their product 
is opposite to the sign of (f 1, f), we obtain the vector e, for which (e*) <0. 


Example 7.54 Let us consider the three-dimensional pseudo-Euclidean space L 
from Example 7.51 and a plane /7 in L. The property of a plane IT being a Euclidean 
space, a pseudo-Euclidean space, or degenerate is clearly illustrated in Fig. 7.13. 

In Fig. 7.13(a), the plane /7 intersects the light cone V in two lines, correspond- 
ing to two linearly independent lightlike vectors. Clearly, this is equivalent to the 
condition that JT also intersects the interior of the light cone, which consists of 
timelike vectors, and therefore is a pseudo-Euclidean plane. In Fig. 7.13(c), it is 
shown that the plane /7 intersects V only in its vertex, that is, JT 7 V = (0). This 
implies that the plane J7 is a Euclidean space, since every nonnull vector e € IT lies 
outside the cone V, that is, (e”) > 0. 

Finally, in Fig. 7.13(b) is shown the intermediate variant: the plane /7 intersects 
the cone V in a single line, that is, it is tangent to it. Since the plane /7 contains 
lightlike vectors (lying on this line), it follows that it cannot be a Euclidean space, 
and since it does not contain timelike vectors, it follows by Lemma 7.53 that it 
cannot be a pseudo-Euclidean space. This implies that J7 is degenerate. 

This is not difficult to verify in another way if we write down the matrix of the 
restriction of the inner product to the plane /7. Suppose that in the orthonormal basis 
€1, 2, €3 from Example 7.49, this plane is defined by the equation x3 = ax; + Bx2. 
Then the vectors g; = e; + we3 and gy = e2 + Be3 form a basis of 7 in which 
l-a? —a 
—oB e 
=e B’) _ (aB)*. On the other hand, the assumption of tangency of the 
plane [7 and cone V amounts to the discriminant of the quadratic form a + a - 
(ax, + Bx)? in the variables x; and x2 being equal to zero. It is easily verified that 
this discriminant is equal to — A, and this implies that it is zero precisely when the 
determinant of this matrix is zero. 


the restriction of the inner product has matrix ( ) with determinant A = 


7.7 Pseudo-Euclidean Spaces 273 


Theorem 7.55 The partition of the light cone V into two poles V4 and V_ does 
not depend on the choice of timelike vector e. In particular, the linearly independent 
lightlike vectors x and y lie in one pole if and only if (x, y) < 0. 


Proof Let us assume that for some choice of timelike vector e, the lightlike vectors 
x and y lie in one pole of the light cone V, and let us show that then, for any choice 
e, they will always belong to the same pole. The case that the vectors x and y are 
proportional, that is, y = Ax, is obvious. Indeed, since (e)+ M V = (0), it follows 
that (e, x) #0, and this implies that the vectors x and y belong to one pole if and 
only if A > 0, independent of the choice of the vector e. 

Now let us consider the case that x and y are linearly independent. Then 
(x, y) £0, since otherwise, the entire plane (x, y) would be contained in the light 
cone V, which by Lemma 7.52, is impossible. Let us prove that regardless of what 
timelike vector e we have chosen for the partition V \ 0 = Vi U V_, the vectors 
x, y € V \ 0 belong to one pole if and only if (x, y) < 0. Let us note that this ques- 
tion, strictly speaking, relates not to the entire space L, but only to the subspace 
(e,x, y), whose dimension, by the assumptions we have made, is equal to 2 or 3, 
depending on whether the vector e does or does not lie in the plane (x, y). 

Let us consider first the case dim(e, x, y) = 2, that is, e € (x, y). Then let us set 
e=ax-+ By. Consequently, (e, x) = B(x, y) and (e, y) =a(x, y), sincex, ye V. 
By definition, vectors x and y are in the same pole if and only if (e, x)(e, y) > 0. 
But since (e,x)(e, y) = aB(x, ie this condition is equivalent to the inequality 
af > 0. The vector e is timelike, and therefore, (e?) < 0, and in view of the equality 
(e?) = 2aB(x, y), we obtain that the condition af > 0 is equivalent to (x, y) < 0. 

Let us now consider the case that dim(e, x, y) = 3. The space (e, x, y) contains 
the timelike vector e. Consequently, by Lemma 7.53, it is a pseudo-Euclidean space, 
and its subspace (x, y) is nondegenerate, since (x, y) 4 0 and (x?) = (y’) = 0. 
Thus here the decomposition (7.73) takes the form 


(e,x, y) = (x, y) ® (h), (7.81) 
where the space (hk) = (x, y)+ is one-dimensional. On the left-hand side of the 
decomposition (7.81) stands a three-dimensional pseudo-Euclidean space, and the 
space (x, y) is a two-dimensional pseudo-Euclidean space; therefore, by the law 
of inertia, the space (#) is a Euclidean space. Thus for the vector e, we have the 
representation 


e=ax+Py+yh, (h,x) =0, (h, y) =0. 
From this follows the equality 
(e,x)=Bx,y),  (@, y)=alx,y), — (*) = 2aB(, y) + 7 (h’). 


Taking into account the fact that (e*) <0 and (h*) > 0, from the last of these re- 
lationships, we obtain that w6(x, y) < 0. The condition that the vectors x and y 
lie in one pole can be expressed as the inequality (e, x)(e, y) > 0, that is, aB > 0. 


274 7 Euclidean Spaces 


Since a(x, y) <0, it follows as in the previous case that this is equivalent to the 
condition (x, y) <0. 


Remark 7.56 As we did in Sect. 3.2 in connection with the partition of a vector 
space L by a hyperplane L’, it is possible to ascertain that the partition of the set 
V \ 0 coincides with its partition into two path-connected components V+ and V_. 
From this we can obtain another proof of Theorem 7.55 without using any formulas. 


A pseudo-Euclidean space emerges in the following remarkable relationship. 


Theorem 7.57 For every pair of timelike vectors x and y, the reverse of the 
Cauchy—Schwarz inequality is satisfied: 


(x, y)* = (x?) - (y’), (7.82) 


which reduces to an equality if and only if x and y are proportional. 


Proof Let us consider the subspace (x, y), in which are contained all the vectors of 
interest to us. If the vectors x and y are proportional, that is, y = Ax, where A is 
some scalar, then the inequality (7.82) obviously reduces to a tautological equality. 
Thus we may assume that dim(x, y) = 2, that is, we may suppose ourselves to be in 
the situation considered in Example 7.49. 

As we have seen, in the space (x, y), there exists a basis f,, f' for which the 
relationship (f7) a (f3) =0, (fi, fo) = 4 holds. Writing the vectors x and y in 
this basis, we obtain the expressions 


x=xXf yp tx2fo, y=yfi+y2fo, 


from which it follows that 


1 
(x?) = x1x0, (y*) =y192, (x, y)= 5 r1y2 +x2y1). 


Substituting these expressions into (7.82), we see that we have to verify the inequal- 
ity (x1y2+x2y1 > 4x1x2y1 y2. Having carried out in the last inequality the obvious 
transformations, we see that this is equivalent to the inequality 


(x1y2 — x2y1)” = 0, (7.83) 


which holds for all real values of the variables. Moreover, it is obvious that the 
inequality (7.83) reduces to an equality if and only if x; y2 — x.y; = 0, that is, if and 
only if the determinant | : | equals 0, and this implies that the vectors x = (x1, x2) 
and y = (yj, y2) are proportional. 


1 
1 


From Theorem 7.57 we obtain the following useful corollary. 


Corollary 7.58 Two timelike vectors x and y cannot be orthogonal. 


7.8 Lorentz Transformations 275 


Proof Indeed, if (x, y) = 0, then from the inequality (7.82), it follows that x) 
( y’) <0, and this contradicts the condition (x7) < 0 and ( y?) <0. 


Similar to the partition of the light cone V into two poles, we can also partition 
its interior into two parts. Namely, we shall say that timelike vectors e and e’ lie 
inside one pole of the light cone V if the inner products (e, x) and (e’, x) have the 
same sign for all vectors x € V and lie inside different poles if these inner products 
have opposite signs. 

A set M Cc Lis said to be convex if for every pair of vectors e, e’ € M, the vectors 
g, =te+(1—1f)e’ are also in M for all ¢ € [0, 1]. We shall prove that the interior 
of each pole of the light cone V is convex, that is, the vector g, lies in the same 
pole as e and e’ for all t € [0, 1]. To this end, let us note that in the expression 
(g,,x) =t(e,x) +(1—1)(e’, x), the coefficients t and 1 — ¢ are nonnegative, and 
the inner products (e, x) and (e’, x) have the same sign. Therefore, for every vector 
x € V, the inner product (g,, x) has the same sign as (e, x) and (e’, x). 


Lemma 7.59 Timelike vectors e and ée’ lie inside one pole of the light cone V if and 
only if (e, e’) < 0. 


Proof If timelike vectors e and e’ lie inside one pole, then by definition, we have 
the inequality (e, x) - (e’, x) > 0 for all x € V. Let us assume that (e, e’) > 0. As we 
established above, the vector g, = te + (1 —1)e’ is timelike and lies inside the same 
pole as e and e’ for all ¢ € [0, 1]. 

Let us consider the inner product (g,, e) = t(e, e) + (1 — t)(e, e’) as a function 
of the variable t € [0, 1]. It is obvious that this function is continuous and that it 
assumes for t = 0 the value (e, e’) > 0, and for t = 1 the value (e, e) < 0. There- 
fore, as is proved in a course in calculus, there exists a value t € [0, 1] such that 
(g,,e) =0. But this contradicts Corollary 7.58. 

Thus we have proved that if vectors e and e’ lie inside one pole of the cone V, 
then (e, e’) < 0. The converse assertion is obvious. Let e and e’ lie inside different 
poles, for instance, e is within V,, while e’ is within V_. Then we have by defini- 
tion that the vector —e’ lies inside the pole V;, and therefore, (e, —e’) < 0, that is, 
(e,e’)>0. 


7.8 Lorentz Transformations 


In this section, we shall examine an analogue of orthogonal transformations for 
pseudo-Euclidean spaces called Lorentz transformations. Such transformations have 
numerous applications in physics.? They are also defined by the condition of pre- 
serving the inner product. 


°For example, a Lorentz transformation of Minkowski space—a four-dimensional pseudo- 
Euclidean space—plays the same role in the special theory of relativity (which is where the term 
Lorentz transformation comes from) as that played by the Galilean transformations, which describe 
the passage from one inertial reference frame to another in classical Newtonian mechanics. 


276 7 Euclidean Spaces 


Definition 7.60 A linear transformation U of a pseudo-Euclidean space L is called 
a Lorentz transformation if the relationship 


(U(x), U(y)) = (x, y) (7.84) 


is satisfied for all vectors x, y EL. 


As in the case of orthogonal transformations, it suffices that the condition (7.84) 
be satisfied for all vectors x = y of the pseudo-Euclidean space L. The proof of this 
coincides completely with the proof of the analogous assertion in Sect. 7.2. 

Here, as in the case of Euclidean spaces, we shall make use of the inner product 
(x, y) in order to identify L* with L (let us recall that for this, we need only the 
nonsingularity of the bilinear form (x, y) and not the positive definiteness of the 
associated quadratic form (x)). As a result, for an arbitrary linear transformation 
”:L— L, we may consider A* also as a transformation of the space L into itself. 
Repeating the same arguments that we employed in the case of Euclidean spaces, 
we obtain that |.A*| = |]. In particular, from definition (7.84), it follows that for a 
Lorentz transformation U, we have the relationship 


U*AU =A, (7.85) 


where U is the matrix of the transformation U in an arbitrary basis e;,..., e, of the 
space L, and A = (q;;) is the Gram matrix of the bilinear form (x, y), that is, the 
matrix with elements a;; = (@;, €;). 

The bilinear form (x, y) is nonsingular, that is, | A| 4 0, and from the relationship 
(7.85) follows the equality |U|? = 1, from which we obtain that |U| = +1. As in 
the case of a Euclidean space, a transformation with determinant equal to | is called 
proper, while if the determinant is equal to —1, it is improper. 

It follows from the definition that every Lorentz transformation maps the light 
cone V into itself. It follows from Theorem 7.55 that a Lorentz transformation either 
maps each pole into itself (that is, U(Vi) = Vi and U(V_) = V_), or else inter- 
changes them (that is, U(V;) = V_ and U(V_) = V,). Let us associate with each 
Lorentz transformation U the number v(U) = +1 in the first case, and v(U) = —1 
in the second. Like the determinant |U|, this number v(U) is a natural character- 
istic of the associated Lorentz transformation. Let us denote the pair of numbers 
(U|, v(U)) by e(U). It is obvious that 


e(U)=e(W), (U1 Ur) = e(Uye(U), 


where on the right-hand side, it is understood that the first and second components 
of the pairs are multiplied separately. We shall soon see that in an arbitrary pseudo- 
Euclidean space, there exist Lorentz transformations U of all four types, that is, 
with e(U) taking all values 


(+1,+]), (+1,-1), (-1,+)), (-1,-1). 


7.8 Lorentz Transformations 277 


This property is sometimes interpreted as saying that a pseudo-Euclidean space has 
not two (as in the case of a Euclidean space), but four orientations. 

Like orthogonal transformations of a Euclidean space, Lorentz transformations 
are characterized by the fact that they map an orthonormal basis of a pseudo- 
Euclidean space to an orthonormal basis. Indeed, suppose that for the vectors of 
the orthonormal basis e;,..., e,,, the equalities 

(ej,e;)=0 fori#j, (ef)=--=(8_,)=1,  (&)=-1 (7.86) 
are satisfied. Then from the condition (7.84), it follows that the images U(e1),..., 
U(en) satisfy analogous equalities, that is, they form an orthonormal basis in L. 
Conversely, if for the vectors e;, the equality (7.86) is satisfied and analogous equal- 
ities hold for the vectors U(e;), then as is easily verified, for arbitrary vectors x and 
y of the pseudo-Euclidean space L, the relationship (7.84) is satisfied. 

Two orthonormal bases are said to have the same orientation if for a Lorentz 
transformation U taking one basis to the other, e(WU) = (+1, +1). The choice of 
a class of bases with the same orientation is called an orientation of the pseudo- 
Euclidean space L. Taking for now on faith the fact (which will be proved a lit- 
tle later) that there exist Lorentz transformations U with all theoretically possible 
e(U), we see that in a pseudo-Euclidean space, it is possible to introduce exactly 
four orientations. 


Example 7.61 Let us consider some concepts about pseudo-Euclidean spaces that 
we encountered in Example 7.49, that is, for dimL = 2 and s = 1. As we have seen, 
in this space, there exists a basis f, f for which the relationships (f =) =(f a) = 
0, (fi, f2) = 5 are satisfied, and the scalar square of the vector x = x f; + yf is 
equal to (x*) = xy. If U: L— Lis a Lorentz transformation given by the formula 


x’ =ax + by, y’ =cx+dy, 


then the equality (U(x), U(x)) = (x, x) for the vector x = x f; + yf takes the 
form x’y’ = xy, that is, (ax + by)(cx + dy) = xy for all x and y. From this, we 
obtain 


ac=0O, bd =0, ad+bc=1. 


In view of the equality ad + bc = 1, the values a = b = 0 are impossible. 
If a £0, then c = 0, and this implies that ad = 1, that is, d =a~! #0 andb=0. 
Thus the transformation U has the form 


x’ =ax, y=a_ly. (7.87) 


This is a proper transformation. 
On the other hand, if b ~ 0, then d = 0, and this implies that c = b-!,a=0.The 
transformation U has in this case the form 


x'=by,  y=b'!x., (7.88) 


278 7 Euclidean Spaces 


This is an improper transformation. 

If we write the transformation U in the form (7.87) or (7.88), depending on 
whether it is proper or improper, then the sign of the number a or respectively b 
indicates whether U interchanges the poles of the light cone or preserves each of 
them. Namely, let us prove that the transformation (7.87) causes the poles to change 
places if a < 0, and preserves them if a > 0. And analogously, the transformation 
(7.88) interchanges the poles if b < 0 and preserves them if b > 0. 

By Theorem 7.55, the partition of the light cone V into two poles Vi and V_ 
does not depend on the choice of timelike vector, and therefore, by Lemma 7.59, we 
need only determine the sign of the inner product (e, U(e)) for an arbitrary timelike 
vector e. Lete = xf, + yf». Then (e”) = xy <0. In the case that U is a proper 
transformation, we have formula (7.87), from which it follows that 


U(e)=axfita'yfs,  (e, U(e)) =(ata')xy. 


Since xy < 0, the inner product (e, U(e)) is negative if a + a~'>0, and positive if 
ata! <0. Butit is obvious thata+a—! > 0 fora >0,anda+a—! <Ofora <0. 
Thus for a > 0, we have (e, U(e)) < 0, and by Lemma 7.59, the vectors e and U(e) 
lie inside one pole. Consequently, the transformation U preserves the poles of the 
light cone. Analogously, for a < 0, we obtain (e, U(e)) > 0, that is, e and U(e) lie 
inside different poles, and therefore, the transformation U interchanges the poles. 

The case of an improper transformation can be examined with the help of for- 
mula (7.88). Reasoning analogously to what has gone before, we obtain from it the 
relationships 


U(e) =b lyf, +bxfo, (e, U(e)) = bx? +b 'y?, 


from which it is clear that now the sign of (e, U(e)) coincides with the sign of the 
number b. 


Example 7.62 It is sometimes convenient to use the fact that a Lorentz transfor- 
mation of a pseudo-Euclidean plane can be written in an alternative form, using 
the hyperbolic sine and cosine. We saw earlier (formulas (7.87) and (7.88)) that in 
the basis f,, f2 defined by the relationship (7.78), proper and improper Lorentz 
transformations are given respectively by the equalities 


Uf) =ahi, U(f.) =a" fo, 
U(f 1) = bf, U(f.)=b' fy. 


From this, it is not difficult to derive that in the orthonormal basis e1, e2, related 
to f,, f2 by formula (7.78), these transformations are given respectively by the 
equalities 


ata! a-—a! 
U(e1) = nd + 502s 
i i (7.89) 
a-—-a_ a+a_ 
U(e2) = eyt+ 2, 


2 2 


7.8 Lorentz Transformations 279 


b+b7! b—b! 


U(e1) = Bel a C2 
; (7.90) 
b—b~ b+b~ 
U(e2) = 5 e| 5 e2. 
Setting here a = +e” and b = +e”, where the sign + coincides with the sign of the 


number a or b in formula (7.89) or (7.90) respectively, we obtain that the matrices 
of the proper transformations have the form 


coshy sinh y —coshy —sinhy 
(a cosh i (eee —cosh al (7.91) 


while the matrices of the improper transformations have the form 


cosh w sinh y —coshy —sinhy 
(: sinhy —cosh ) - ( sinh y cosh y ) , G2) 


where sinh y = (e” — e~¥)/2 and cosh = (e¥ + e~”)/2 are the hyperbolic sine 
and cosine. 


Theorem 7.63 In every pseudo-Euclidean space there exist Lorentz transforma- 
tions U with all four possible values of e(U). 


Proof For the case dimL = 2, we have already proved the theorem: In Exam- 
ple 7.62, we saw that there exist four distinct types of Lorentz transformation of a 
pseudo-Euclidean space having in a suitable orthonormal basis the matrices (7.91), 
(7.92). It is obvious that with these matrices, the transformation U gives all possible 
values 6(U). 

Let us now move on to the general case dimL > 2. Let us choose in the pseudo- 
Euclidean space L an arbitrary timelike vector e and any e’ not proportional to it. 
By Lemma 7.53, the two-dimensional space (e, e’) is a pseudo-Euclidean space 
(therefore nondegenerate), and we have the decomposition 

L=(e,e') Ble, e\" 
From the law of inertia, it follows that the space (e, e’)+ is a Euclidean space. In Ex- 
ample 7.62, we saw that in the pseudo-Euclidean plane (e, e’), there exists a Lorentz 
transformation U, with arbitrary value ¢(U ). Let us define the transformation 
U:L— Las U, in (e, e’) and & in (e, e’)+, that is, for a vector x = y + z, where 
y € (e,e’) and z € (e, er, we shall set U(x) = Ui(y) + z. Then U is clearly a 
Lorentz transformation, and ¢«(U) = e(U}). 


There is an analogue to Theorem 7.24 for Lorentz transformations. 


Theorem 7.64 If a space L’ is invariant with respect to a Lorentz transformation 
U, then its orthogonal complement (L')+ is also invariant with respect to U. 


280 7 Euclidean Spaces 


Proof The proof of this theorem is an exact repetition of the proof of Theorem 7.24, 
since there, we did not use the positive definiteness of the quadratic form (x7) as- 
sociated with the bilinear form (x, y), but only its nonsingularity. See Remark 7.25 
on p. 227. 


The study of a Lorentz transformation of a pseudo-Euclidean space is reduced to 
the analogous question for orthogonal transformations of a Euclidean space, based 
on the following result. 


Theorem 7.65 For every Lorentz transformation U of a pseudo-Euclidean space 
L, there exist nondegenerate subspaces Lo and L, invariant with respect to U such 
that L has the orthogonal decomposition 


L=Lo@L;, Lolly, (7.93) 


where the subspace Lo is a Euclidean space, and the dimension of L, is equal to 1, 
2, or 3. 


It follows from the law of inertia that if dimL; = 1, then Lj is spanned by a 
timelike vector. If dimL = 2 or 3, then the pseudo-Euclidean space L; can be rep- 
resented in turn by a direct sum of subspaces of lower dimension invariant with 
respect to U. However, such a decomposition is no longer necessarily orthogonal 
(see Example 7.48). 


Proof of Theorem 7.65 The proof is by induction on n, the dimension of the space L. 
For n = 2, the assertion of the theorem is obvious—in the decomposition (7.93) one 
has only to set Lo = (0) and Ly = L.!° 

Now let n > 2, and suppose that the assertion of the theorem has been proved for 
all pseudo-Euclidean spaces of dimension less than n. We shall use results obtained 
in Chaps. 4 and 5 on linear transformations of a vector space into itself. Obviously, 
one of the following three cases must hold: the transformation U has a complex 
eigenvalue, U has two linearly independent eigenvectors, or the space L is cyclic 
for U, corresponding to the only real eigenvalue. Let us consider the three cases 
separately. 


Case I. A linear transformation U of a real vector space L has a complex eigen- 
value 4. As established in Sect. 4.3, then U also has the complex conjugate eigen- 
value 2, and moreover, to the pair A, i there corresponds the two-dimensional real 
invariant subspace L’ C L, which contains no real eigenvectors. It is obvious that L’ 
cannot be a pseudo-Euclidean space: for then the restriction of U to L’ would have 
real eigenvalues, and L’ would contain real eigenvectors of the transformation U; 
see Examples 7.61 and 7.62. Let us show that L’ is nondegenerate. 


'OThe nondegeneracy of the subspace Lo = (0) relative to a bilinear form follows from the defi- 
nitions given on pages 266 and 195. Indeed, the rank of the restriction of the bilinear form to the 
subspace (0) is zero, and therefore, it coincides with dim(0). 


7.8 Lorentz Transformations 281 


Suppose that L’ is degenerate. Then it contains a lightlike vector e 4 0. Since U 
is a Lorentz transformation, the vector U(e) is also lightlike, and since the subspace 
L’ is invariant with respect to U, it follows that U(e) is contained in L’. Therefore, 
the subspace L’ contains two lightlike vectors: e and U(e). By Lemma 7.53, these 
vectors cannot be linearly independent, since then L’ would be a pseudo-Euclidean 
space, but that would contradict our assumption that L’ is degenerate. From this, it 
follows that the vector U(e) is proportional to e, and that implies that e is an eigen- 
vector of the transformation U, which, as we have observed above, cannot be. This 
contradiction means that the subspace L’ is nondegenerate, and as a consequence, it 
is a Euclidean space. 


Case 2. The linear transformation U has two linearly independent eigenvectors: e1 
and e. If at least one of them is not lightlike, that is, (e?) # 0, then L’ = (e;) is 
a nondegenerate invariant subspace of dimension 1. And if both eigenvectors e, 
and é are lightlike, then by Lemma 7.53, the subspace L’ = (e1, e2) is an invariant 
pseudo-Euclidean plane. 

Thus in both cases, the transformation U has a nondegenerate invariant subspace 
L’ of dimension 1 or 2. This means that in both cases, we have an orthogonal de- 
composition (7.73), that is, L = L’ @ (L’)+. If L’ is one-dimensional and spanned by 
a timelike vector or is a pseudo-Euclidean plane, then this is exactly decomposition 
(7.93) with Lo = (L’)+ and L, = L’. In the opposite case, the subspace L’ is a Eu- 
clidean space of dimension 1 or 2, and the subspace (L’)+ is a pseudo-Euclidean 
space of dimension n — | or n — 2 respectively. By the induction hypothesis, for 
(L’)+, we have the orthogonal decomposition (L’)+ = Lj ® L{ analogous to (7.93). 
From this, for L we obtain the decomposition (7.93) with Lo = L’ ® Lo and Ly = L'. 


Case 3. The space L is cyclic for the transformation U, corresponding to the unique 
real eigenvalue 4 and principal vector e of grade m = n. Obviously, for n = 2, this 
is impossible: as we saw in Example 7.61, in a suitable basis of a pseudo-Euclidean 
plane, a Lorentz transformation has either diagonal form (7.87) or the form (7.88) 
with distinct eigenvalues +1. In both cases, it is obvious that the pseudo-Euclidean 
plane L cannot be a cyclic subspace of the transformation U. 

Let us consider the case of a pseudo-Euclidean space L of dimension n > 3. We 
shall prove that L can be a cyclic subspace of the transformation U only if n = 3. 

As we established in Sect. 5.1, in a cyclic subspace L, there is a basis e1,..., en 
defined by formula (5.5), that is, 


er=e, en =(U—AE)(e), vey Cp =(U—AE)"™ 1), (7.94) 
in which relationships (5.6) hold: 


U(e,) =Ae, +2, U(e2) =Ae2 + €3, Lees U(en) =en. (7.95) 


282 7 Euclidean Spaces 


In this basis, the matrix of the transformation U has the form of a Jordan block 


A 0 O 0 
1 A O 0 
Ola 0 
U=|: ey, “Bs |. (7.96) 
; - A O 
0 0 0 :. IT A 


It is easy to see that the eigenvector e, is lightlike. Indeed, if we had (e?) #0, 
then we would have the orthogonal decomposition L = (en) ® (e,)+, where both 
subspaces (e,) and (e,)+ are invariant. But this contradicts the assumption that the 
space L is cyclic. 

Since U is a Lorentz transformation, it preserves the inner product of vectors, 
and from (7.95), we obtain the equality 


(€;, €n) = (U(e;), U(en)) = (Ae; + e741, En) 
= 178), €n) + (E41; €n) (7.97) 


foralli=1,...,n—1. 
If 22 1, then from (7.97), it follows that 


(€;,€n) = leis ,€n). 
Substituting into this equality the values of the index i =n — 1,..., 1, taking into 
account that (e?) = 0, we therefore obtain step by step that (e;, e,) = 0 for all 7. 
This means that the eigenvector e, is contained in the radical of the space L, and 
since L is a pseudo-Euclidean space (that is, in particular, nondegenerate), it follows 
that e,, = 0. This contradiction shows that A? = 1. 

Substituting 47 = 1 into the equalities (7.97) and collecting like terms, we find 
that (€;41, €n) = 0 for all indices i = 1,...,n —1, that is, (e;, €,) = 0 for all indices 
j =2,...,n. In particular, we have the equalities (e,-1,é@n) = 0 for n > 2 and 
(€,—2, €n) = 0 for n > 3. From this it follows that n = 3. Indeed, from the condition 
of preservation of the inner product, we have the relationship 


(€n—2, €n—1) = (U(en—2), U(en-1)) = (A€n—-2 + €n—1, A€n-1 + Cn) 
= 7 (en—2, €n—1) +A(En—-2, Cn) + r(e7_1) + (€n-1, en); 


from which, taking into account the relationships 22 = 1 and (e,_1,e,) = 0, we 
have the equality (@,—2, en) + rai) = 0. If n > 3, then (e,_2, e,) = 0, and from 
this, we obtain that a = 0, that is, the vector e,,_; is lightlike. 

Let us examine the subspace L’ = (ey, €n—1). It is obvious that it is invariant 
with respect to the transformation U, and since it contains two linearly independent 


7.8 Lorentz Transformations 283 


lightlike vectors e, and e,_;, then by Lemma 7.53, the subspace L’ is a pseudo- 
Euclidean space, and we obtain the decomposition L = L’ @ (L’)+ as a direct sum 
of two invariant subspaces. But this contradicts the fact that the space L is cyclic. 
Therefore, the transformation U can have cyclic subspaces only of dimension 3. 
Putting together cases 1, 2, and 3, and taking into account the induction hypoth- 
esis, we obtain the assertion of the theorem. 


Combining Theorems 7.27 and 7.65, we obtain the following corollary. 


Corollary 7.66 For every transformation of a pseudo-Euclidean space, there exists 
an orthonormal basis in which the matrix of the transformation has block-diagonal 
form with blocks of the following types: 


1. blocks of order 1 with elements +1, 

2. blocks of order 2 of type (7.29); 

3. blocks of order 2 of type (7.91)-(7.92); 

4. blocks of order 3 corresponding to a three-dimensional cyclic subspace with 
eigenvalue +1. 


It follows from the law of inertia that the matrix of a Lorentz transformation can 
contain not more than one block of type 3 or 4. 


Let us note as well that a block of type 4 corresponding to a three-dimensional 
cyclic subspace cannot be brought into Jordan normal form in an orthonormal basis. 
Indeed, as we saw earlier, a block of type 4 is brought into Jordan normal form in the 
basis (7.94), where the eigenvector e,, is lightlike, and therefore, it cannot belong to 
any orthonormal basis. 

With the proof of Theorem 7.65 we have established necessary conditions for a 
Lorentz transformation to have a cyclic subspace—in particular, its dimension must 
be 3, corresponding to an eigenvalue equal to +1, and eigenvector that is lightlike. 
Clearly, these necessary conditions are not sufficient, since in deriving them, we 
used the equalities (e;, ex) = (U(e;), U(ex)) for only some of the vectors of the 
basis (7.94). Let us show that Lorentz transformations with cyclic subspaces indeed 
exist. 


Example 7.67 Let us consider a vector space L of dimension n = 3. Let us choose 
in L a basis ej, e2, e€3 and define a transformation U:L— L using relationships 
(7.95) with the number 4 = +1. Then the matrix of the transformation U will take 
the form of a Jordan block with eigenvalue 1. 

Let us choose the Gram matrix for a basis e1, e@2, e3 such that L is given the struc- 
ture of a pseudo-Euclidean space. With the proof of Theorem 7.65, we have found 
necessary conditions (e2, e3) = 0 and (e3) = 0. Let us set (e7) =a, (€1,e2)=b, 
(e1, €3) =c, and (e3) = d. Then the Gram matrix can be written as 


A= (7.98) 


a Fa 


b 
d 
0 


oon 


284 7 Euclidean Spaces 


On the other hand, as we know (see Example 7.51, p. 270), in L there exists an 
orthonormal basis in which the Gram matrix is diagonal and has determinant —1. 
Since the sign of the determinant of the Gram matrix is one and the same for all 
bases, it follows that |A| = —cd < 0, that is, c AO andd > 0. 

The conditions c #0 and d > 0 are also sufficient for the vector space in which 
the inner product is given by the Gram matrix A in the form (7.98) to be a pseudo- 
Euclidean space. Indeed, choosing a basis gj, g>, g3 in which the quadratic form 
associated with the matrix A has canonical form (6.28), we see that the condition 
|A| < 0 is satisfied by, besides a pseudo-Euclidean space, only a space in which 
(g?) = —1 for all i = 1, 2,3. But such a quadratic form is negative definite, that is, 
(x?) <0 for all vectors x 0, and this contradicts that (e3) =d>0. 

Let us now consider the equalities (e;, ex) = (U(e;), U(ex)) for all indices i <k 
from 1| to 3. Taking into account 7 =1, (eo, e3) = 0, and (e3) = 0, we see that they 
are satisfied automatically except for the cases i =k = 1 andi = 1, k = 2. These 
two cases give the relationships 2Ab + d = 0 and c + d =0. Thus we may choose 
the number a arbitrarily, the number d to be any positive number, and set c = —d 
and b = —Ad/2. It is also not difficult to ascertain that linearly independent vectors 
€1, €2, €3 satisfying such conditions in fact exist. 


Just as in a Euclidean space, the presence of different orientations of a pseudo- 
Euclidean space determined by the value of e(U) for the Lorentz transformation 
U is connected with the concept of continuous deformation of a transformation 
(p. 230), which defines an equivalence relation on the set of transformations. 

Let U; be a family of Lorentz transformations continuously depending on the pa- 
rameter t. Then |U;| also depends continuously on f¢, and since the determinant of 
a Lorentz transformation is equal to +1, the value of |U;| is constant for all t. Thus 
Lorentz transformations with determinants having opposite signs cannot be contin- 
uously deformed into each other. But in contrast to orthogonal transformations of a 
Euclidean space, Lorentz transformations U; have an additional characteristic, the 
number v(U,) (see the definition on p. 276). Let us show that like the determinant 
|U;|, the number v(U,;) is also constant. 

To this end, let us choose an arbitrary timelike vector e and make use of 
Lemma 7.59. The vector U;(e) is also timelike, and moreover, v(U;) = +1 if e and 
U;(e) lie inside one pole of the light cone, that is, (e, U;(e)) < 0, and v(U;) = —-1 
if e and U,;(e) lie inside different poles, that is, (e, U;(e)) > 0. It then remains to 
observe that the function (e, U;(e)) depends continuously on the argument ¢, and 
therefore can change sign only if for some value of t, it assumes the value zero. But 
from inequality (7.82) for timelike vectors x = e and y = U;(e), there follows the 
inequality 


(e, Us(e))” = (e”) - (Ur(e)?) > 0, 


showing that (e, U;(e)) cannot be zero for any value of f. 
Thus taking into account Theorem 7.63, we see that the number of equivalence 
classes of Lorentz transformations is certainly not less than four. Now we shall 


7.8 Lorentz Transformations 285 


show that there are exactly four. To begin with, we shall establish this for a pseudo- 
Euclidean plane, and thereafter shall prove it for a pseudo-Euclidean space of arbi- 
trary dimension. 


Example 7.68 The matrices (7.91), (7.92) presenting all possible Lorentz transfor- 
mations of a pseudo-Euclidean plane can be continuously deformed into the matri- 


ces 
1 0 -1 0 
e=(0 1) = (0 4): 
1 0 -1 0 
a(t 4) 8-(0 i) 


respectively. Indeed, we obtain the necessary continuous deformation if in the ma- 
trices (7.91), (7.92) we replace the parameter w by (1 — t)w, where ¢ € [0, 1]. It is 
also clear that none of the four matrices (7.99) can be continuously deformed into 
any of the others: any two of them differ either by the signs of their determinants 
or in that one of them preserves the poles of the light cone, while the other causes 
them to exchange places. 


(7.99) 


In the general case, we have an analogue of Theorem 7.28. 


Theorem 7.69 Two Lorentz transformations U, and U2 of a real pseudo- 
Euclidean space are continuously deformable into each other if and only if e(U,) = 
é(U2). 


Proof Just as in the case of Theorem 7.28, we begin with a more specific assertion: 
we shall show that an arbitrary Lorentz transformation U for which 


e(U) = (|Ul, (U)) = 41,4) (7.100) 


holds can be continuously deformed into €. Invoking Theorem 7.65, let us examine 
the orthogonal decomposition (7.93), denoting by U; the restriction of the transfor- 
mation U to the invariant subspace L;, where i = 0, 1. We shall investigate three 
cases in turn. 


Case 1. In the decomposition (7.93), the dimension of the subspace L; is equal to 
1, that is, L; = (e), where (e”) < 0. Then to the subspace L, there corresponds 
in the matrix of the transformation U a block of order 1 with o = +1 or —1, 
and Up is an orthogonal transformation that depending on the sign of o, can be 
proper or improper, so that the condition |U| = o7|Uo| = | is satisfied. However, 
it is easy to see that for o = —1, we have v(U) = —1 (since (e, U(e)) > 0), and 
therefore, the condition (7.100) leaves only the case o = +1, and consequently, the 
orthogonal transformation Uo is proper. Then U is the identity transformation (of 
a one-dimensional space). By Theorem 7.28, an orthogonal transformation Uo is 


286 7 Euclidean Spaces 


continuously deformable into the identity, and therefore, the transformation U is 
continuously deformable into &. 


Case 2. In the decomposition (7.93), the dimension of the subspace L; is equal to 
2, that is, L; is a pseudo-Euclidean plane. Then as we established in Examples 7.62 
and 7.68, in some orthonormal basis of the plane L1, the matrix of the transformation 
U, has the form (7.92) and is continuously deformable into one of the four matrices 
(7.99). It is obvious that the condition v(U) = | is associated with only the matrix 
E and one of the matrices F2, F3, namely the one in which the eigenvalues +1 
correspond to the eigenvectors g, in such a way that (g3) < 0 and (g2) > 0. In 
this case, itis obvious that we have the orthogonal decomposition L} = (g1) ®(g_). 

If the matrix of the transformation U, is continuously deformable into F, then 
the orthogonal transformation Up is proper, and it follows that it is also continuously 
deformable into the identity, which proves our assertion. 

If the matrix of the transformation U; is continuously deformable into F2 or 
F, then the orthogonal transformation Uo is improper, and consequently, its matrix 
is continuously deformable into the matrix (7.32), which has the eigenvalue —1 
corresponding to some eigenvector  € Lo. From the orthogonal decomposition L = 
Lo ® (gi) ® (g_), taking into account (g2) < 0, it follows that the invariant plane 
L’ = (g_,h) is a Euclidean space. The matrix of the restriction of U to L’ is equal 
to —E, and is therefore continuously deformable into E. And this implies that the 
transformation U is continuously deformable into &. 


Case 3. In the decomposition (7.93), the subspace Lj is a cyclic three-dimensional 
pseudo-Euclidean space with eigenvalue A = +1. This case was examined in detail 
in Example 7.67, and we will use the notation introduced there. It is obvious that the 
condition v(U) = | is satisfied only for A = 1, since otherwise, the transformation 
U, takes the lightlike eigenvector e3 to —e3, that is, it transposes the poles of the 
light cone. Thus condition (7.100) corresponds to the Lorentz transformation U 
with the value e(U,) = (+1, +1) and proper orthogonal transformation Uo. 

Let us show that such a transformation U, is continuously deformable into the 
identity. Since Uo is obviously also continuously deformable into the identity, this 
will give us the required assertion. 

Thus let 4 = 1. We shall fix in L; a basis e1, e2, e3 satisfying the following con- 
ditions introduced in Example 7.67: 


d 
2) _ —— 
(e;) =a, (€1,@2) = 5° (7.101) 


(e1,e3)=—-d, (€3)=d, — (er, 3) = (€3) = 0 


with some numbers a and d > 0. The Gram matrix A in this basis has the form 
(7.98) with c = —d and b = —d/2, while the matrix U; of the transformation U, 
has the form of a Jordan block. 


7.8 Lorentz Transformations 287 


Let U; be a linear transformation of the space Lj whose matrix in the basis 
€1, @2, €3 has the form 


1 
U,=] ¢ 
g(t) 
where t¢ is a real parameter taking values from 0 to 1, and g(t) is a continuous func- 
tion of ¢ that we shall choose in such a way that U; is a Lorentz transformation. As 
we know, for this, the relationship (7.85) with matrix U = U; must be satisfied. Sub- 
stituting in the equality U;“AU; = A the matrix A of the form (7.98) with c = —d 
and b = —d/2 and matrix U; of the form (7.102) and equating corresponding el- 
ements on the left- and right-hand sides, we obtain that the equality U;AU; = A 
holds if g(t) = t(t — 1)/2. For such a choice of function g(t), we obtain a family 
of Lorentz transformations U; depending continuously on the parameter f € [0, 1]. 
Moreover, it is obvious that for t = 1, the matrix U, has the Jordan block U;, while 
for t = 0, the matrix U; equals EF. Thus the family U, effects a continuous defor- 
mation of the transformation U, into &. 
Now let us prove the assertion of Theorem 7.69 in general form. Let W be a 
Lorentz transformation with arbitrary ¢(‘W). We shall show that it can be continu- 
ously deformed into the transformation ¥ , having in some orthonormal basis the 


block-diagonal matrix 
E 0 
r=(0 F): 


where E is the identity matrix of order n — 2 and F’ is one of the four matrices 
(7.99). It is obvious that by choosing a suitable matrix F’, we may obtain the Lorentz 
transformation ¥ with any desired e(¥). Let us select the matrix F’ in such a way 
that e(F) = e(W). 

Let us select in our space an arbitrary orthonormal basis, and in that basis, let 
the transformation ‘W have matrix W. Then the transformation U having in this 
same basis the matrix U = WF is a Lorentz transformation, and moreover, by our 
choice of e(¥ ) = e('W), we have the equality e(U) = e(W)e(F) = (+1, +1). Fur- 
ther, from the trivially verified relationship F —! = F, we obtain W = UF, that is, 
‘W = UF. We shall now make use of a family U; that effects the continuous defor- 
mation of the transformation U into &. From the equality W = UF, with the help 
of Lemma 4.37, we obtain the relationship W, = U;F, in which Wo = €F = F 
and W; = UF = W. Thus it is this family W, = U;F that accomplishes the defor- 
mation of the Lorentz transformation ‘W into ¥. 

If U; and Uy are Lorentz transformations such that e(U1) = e(U2), then by 
what we showed earlier, each of them is continuously deformable into F with one 
and the same matrix F’. Consequently, by transitivity, the transformations U, and 
Uy are continuously deformable into each other. 


0 0 
1 OO], (7.102) 
t 1 


Similarly to what we did in Sects. 4.4 and 7.3 for nonsingular and orthogonal 
transformations, we can express the fact established by Theorem 7.69 in topological 


288 7 Euclidean Spaces 


form: the set of Lorentz transformations of a pseudo-Euclidean space of a given 
dimension has exactly four path-connected components. They correspond to the four 
possible values of e(U). 

Let us note that the existence of four (instead of two) orientations is not a specific 
property of pseudo-Euclidean spaces with the quadratic form (7.76), as was the case 
with the majority of properties of this section. It holds for all vector spaces with a 
bilinear inner product (x, y), provided that it is nonsingular and the quadratic form 
(x?) is neither positive nor negative definite. We can indicate (without pretending 
to provide a proof) the reason for this phenomenon. If the form (x7), in canonical 
form, appears as 


eee ne oe om where s € {1,...,2 — 1}, 


then the transformations that preserve it include first of all, the orthogonal trans- 


formations preserving the form ae tere + x and not changing the coordinates 
Xs+1,--+,Xn, and secondly, the transformations preserving the quadratic form 
a4 feet a and not changing the coordinates x, ..., xs. Every type of transfor- 


mation is “responsible” for its own orientation. 


Chapter 8 
Affine Spaces 


The usual objects of study in plane and solid geometry are the plane and three- 
dimensional space, both of which consist of points. However, vector spaces are 
logically simpler, and therefore, we began by studying them. Now we can move 
on to “point” (affine) spaces. The theory of such spaces is closely related to that 
of vector spaces, and so in this chapter, we shall be concerned only with questions 
relating specifically to this case. 


8.1 The Definition of an Affine Space 


Let us return to the starting point in the theory of vector spaces, namely to Sect. 3.1. 
There, we said that two points in the plane (or in space) determine a vector. We shall 
make this property the basis of the axiomatic definition of affine spaces. 


Definition 8.1 An affine space is a pair (V, L) consisting of a set V (whose elements 
are called points) and a vector space L, on which a rule is defined whereby two points 


A, B € V are associated with a vector of the space L, which we shall denote by AB 
(the order of the points A and B is significant). Here the following conditions must 
be satisfied: 


a ee 
(1) AB+ BC=AC. 
(2) For every three points A, B, C € V, there exists a unique point D € V such that 
=> > 
AB=CD. (8.1) 


(3) For every two points A, B € V and scalar a, there exists a unique point C € V 
such that 


ACH=BAB. (8.2) 


Remark 8.2 From condition (2), it follows that we also have AC = BD. Indeed, in 
> > > > > 
view of condition (1), we have the equalities AB + BD = AD and AC+CD= 


LR. Shafarevich, A.O. Remizov, Linear Algebra and Geometry, 289 
DOI 10.1007/978-3-642-30994-6_8, © Springer-Verlag Berlin Heidelberg 2013 


290 8 Affine Spaces 


Fig. 8.1 Equality of vectors D 
B 
Cc 
A 
— a . > > > > : : => ==> 
AD. This implies that AB + BD = AC + CD (see Fig. 8.1). Since AB = CD by 
=>lcC Cc ol 


assumption, and all vectors belong to the space L, it follows that AC = BD. 


From these conditions and the definition of a vector space, it is easy to derive 


that for an arbitrary point A € V, the vector AA is equal to 0, and for every pair of 
points A, B € V, we have the equality 


—> => 
BA=—AB. 


It is equally easy to verify that if we are given a point A € V anda vector x = AB 
in the space L, then the point B € V is thereby uniquely determined. 


Theorem 8.3 The totality of all vectors of the form AB, where A and B are arbi- 
trary points of V, forms a subspace LU of the space L. 


Proof Let x = AB, y= CD. By condition (2), there exists a point K such that 
BK a CD. Then by condition (1), the vector 


> > > > > 
AK =AB+BK =AB+CD=x+y 


is again contained in the subspace L’. Analogously, for any vector x = AB in L’ ; 
— —= 
condition (3) gives the vector AC = aAB = ax, which consequently also is con- 


tained in L’. 


In view of Theorem 8.3, we shall require for the study of an affine space (V, L) 
not all the vectors of the space L, but only those that lie in the subspace L’. Therefore, 
in what follows, we shall denote the space L’ by L. In other words, we shall assume 
that the following condition is satisfied: for every vector x € L, there exist points A 
and B in V such that x = AB. 

This condition does not impose any additional constraints. It is simply equivalent 
to a change of notation: L instead of L’. 


Example 8.4 Every vector space L defines an affine space (L, L) if for two vectors 


a,b € Lconsidered as points of the set V =L, we set ab = b — a. In particular, the 
totality IK” of all rows of length n defines an affine space. 


8.1 The Definition of an Affine Space 291 


Example 8.5 The plane and space studied in a course in elementary or analytic 
geometry are examples of affine spaces. 


Condition (2) in the definition of an affine space shows that no matter how we 


= 
choose the point O in the set V, every vector x € L can be represented as x = OA. 
Moreover, from the requirement of the uniqueness of the point D in condition (2), 
it follows that for a designated point O and vector x, the point A is uniquely de- 


termined by the condition OA = x. Thus having chosen (arbitrarily) a point O €« V 


and associating with each point A € V the vector OA, we obtain a bijection between 
the points A of the set V and the vectors x of the space L. In other words, an affine 
space is a vector space in which the coordinate origin O is not fixed. This notion is 
more natural from a physical point of view; in an affine space, all points are created 
equal, or in other words, the space is uniform. Mathematically, such a notion seems 
more complex: we need to specify not one, but two sets: V and L. And though we 
write an affine space as a pair (V,L), we shall often denote such a space simply by 
V, leaving L implied and assuming that the condition formulated above is satisfied. 
In this case, we shall call L the space of vectors of the affine space V. 


Definition 8.6 The dimension of an affine space (V,L) is the dimension of the as- 
sociated vector space L. When we wish to focus our attention on the space V, then 
we shall denote the dimension by dim V. 


In the sequel, we shall consider only spaces of finite dimension. We shall call an 
affine space of dimension | a line, and an affine space of dimension 2, a plane. 

Having selected the point O € V, we obtain a bijection V — L. If dimL=n 
and we choose in the space L some basis e1, ..., @€n, then we have the isomorphism 
LK”. Thus for an arbitrary choice of a point O € V and basis in L, we obtain a bi- 
jection V — K” and define each point of the affine space V by the set of coordinates 


ae j 
(@1,...,@,) of the vector x = OA in the basis e],..., en. 
Definition 8.7 The point O and basis e1,..., @, together are called a frame of ref- 
erence in the space V, and we write (O; e1,..., @,). The n-tuple (a, ..., @,) asso- 


ciated with the point A € V is called the coordinates of the point A of the associated 
frame of reference. 


If relative to the frame of reference (O; e),..., @,), the point A has coordinates 

— 

(a@1,...,@,), while the point B has coordinates (f1,..., By), then the vector AB 
has, with respect to the basis e;,..., @,, coordinates (6; — a@1,..., Bn — Qn). 


Just as with the selection of a basis in a vector space, every vector of that space is 
determined by its coordinates, likewise is every point of an affine space determined 
by its coordinates in a given frame of reference. Thus a frame of reference plays the 
same role in the theory of affine spaces as that played by a basis in the theory of 
vector spaces. We have defined frame of reference as a collection consisting of the 
point O and n vectors e1,..., €, that form a basis of L. Any of these vectors e; can 


be written in the form e; = O A;, and then it is possible to give the frame of reference 


292 8 Affine Spaces 


as acollection of n+ 1 points O, Aj,..., Ay. Here the points O, A,,..., A, are not 
—> — 
arbitrary; they must satisfy the property that the vectors O Ai, ..., OA, form a basis 


of L, that is, they must be linearly independent. 
We have seen that the choice of a point O in V determines an isomorphism be- 


tween V and L that assigns to each point A € V the vector OA € L. Let us consider 
how this correspondence changes when we change the point O. If we began with the 
point O’, then we will have placed in correspondence with the point A, the vector 


— _->-lrl 
O’A, which, by definition of an affine space, is equal to OO + OA. Thus if in the 
first case, we assign to the point A the vector x, then in the second, we assign the 


— 
vector x + a, where a = O’O. We obtain a corresponding mapping of the set V if 


to the point A, we assign the point B such that AB =a. Such a point B is uniquely 
determined by the choice of A and a. 


Definition 8.8 A translation of an affine space (V, L) by a vector a € L is a mapping 


of the set V into itself that assigns to the point A the point B such that AB =a. (The 
existence and uniqueness of such a point B € V for every A € V anda € L follows 
from the definition of affine space.) 


We shall denote the translation by the vector a by ¥q. Thus the definition of a 
translation can be written as the formula 


Fq(A)=B, where AB =a. 


From the given definition, a translation is an isomorphism of the set V into itself. It 
can be depicted with the help of the diagram 


Vv 


L (8.3) 


where the bijection yy between V and L is defined using the point O, while the 


— 
bijection w’ uses the point O’, and Jq is a translation by the vector a= O’O. Asa 
result, the mapping yw is the product (sequential application, or composition) of the 
mappings Jq and wy’. This relationship can be more briefly written as w’ = y +a. 


Proposition 8.9 Translations possess the following properties: 


C1) Tas = Ta+p: 
(2) T9=€& 


GB) Ta =T, 1. 


8.1 The Definition of an Affine Space 293 


Proof 1n property (1), the left-hand side consists of the product of mappings, which 
means that for every point C € V, the equality 


Ta(T5(C)) = Ta+5(C) (8.4) 


is satisfied. Let us represent the vector b in the form b = CP (not only is this pos- 
sible, but by the definition of affine space, the point P € V is uniquely determined). 
Then we have the equality 9,(C) = P. Likewise, let us represent the vector a in the 
form a = PO. Then analogously, %,(P) = Q. It follows from these relationships 
that 


a as 
a+b=CP+PQ=CQ, 


from which we obviously obtain %{4,(C) = Q. On the other hand, we have the 
equality Ta(Tp(C)) = Fa(P) = Q, which proves the relationship (8.4). 
Properties (2) and (3) can be proved even more easily. 


Let us note that for any two points A, B € V, there exists a unique vector a € L 
— 
for which %,(A) = B, namely, the vector a= AB. 


Suppose that we are given a certain frame of reference (O; e1,..., @n). Relative 
to this frame of reference, every point A € V has coordinates (x1,...,x,). A func- 
tion F(A) defined on the affine space V and taking numeric values is called a poly- 
nomial if it can be written as a polynomial in the coordinates x1, ..., Xn. 


This definition can be given a different formulation. Let us denote by yw: V > L 
the bijection between V and L determined by the selection of an arbitrary point O. 
Then the function F on V is a polynomial if it can be represented in the form 
F(A) = G(wW(A)), where G(x) is a polynomial on the space L (see the definition 
on p. 127). To be sure, it is still necessary to verify that this definition does not de- 
pend on the choice of frame of reference (O; e1,...,@,), but this can be done very 
easily. If y’ : V > L is a bijection between V and L determined by the choice of 
point O’ (cf. diagram (8.3)), then y’ = Ww + a. As we saw in Sect. 3.8, the property 
of a function G(x) being a polynomial does not depend on the choice of basis in L, 
and it remains to verify that for a polynomial G(x) and vector a € L, the function 
G(x +a) is also a polynomial. It is clearly sufficient to verify this for the monomial 


ext! . wk If the vector x has coordinates x;,...,X,, and the vector a has coor- 
, ee . , k k : 
dinates a), ...,@,, then substituting them into the monomial is Oi -++X,", we obtain 


the expression c(x; +a pk e+ (tp Fan)’, which is clearly also a polynomial in the 
variables x1,...,Xy- 

Using the same considerations as those employed in Example 3.86 on p. 130, we 
may define for an arbitrary polynomial F on an affine space V its differential do F 
at an arbitrary point O € V. Here the differential dg F will be a linear function 
on the space of vectors L of the space V, that is, it will be a vector in the dual 
space L*. Indeed, let us consider the bijection y : V — L constructed earlier, for 
which w(O) = 0; let us represent F in the form F(A) = G((A)), where G(x) is 
some polynomial on the vector space L; and let us define dg F = doG as a linear 
function on L. 


294 8 Affine Spaces 


Suppose that we are given the frame of reference (O; e;,...,@,) in the space V. 
Then F(A) is a polynomial in the coordinates of the point A with respect to this 
frame of reference. Let us write down the expression do F in these coordinates. By 
definition, the differential 


n 


aG 
doF=aG=) > ay Oi 
I 


is a linear function in the coordinates x1, ...,X, with respect to the basis e1,..., €n. 
Here 0G/0x; is a polynomial, and it corresponds to some polynomial ; on V, 
that is, it has the form @;(A) = ge (w(A)). By definition, we set ®; = 0 F/0x;. Itis 
easy to verify that if we express F and ®; as polynomials in x1, ..., X,, then ®; will 
indeed be the partial derivative of F with respect to the variable x;. Since w(O) = 0, 
it follows that 5 (0) = 5 (0). Consequently, we obtain for the differential do F, 
the expression 


n 
OF 
doF= ) a (O)xi, 
i=l 


which is similar to formula (3.70) obtained in Sect. 3.8. 


8.2 Affine Spaces 


Definition 8.10 A subset V’ Cc V of an affine space (V, L) is an affine subspace if 


— 
the set of vectors AB for all A, B € V’ forms a vector subspace L’ of the vector 
space L. 


It is obvious that then V’ itself is an affine subspace, and L’ is its space of vectors. 
If dim V’ = dim V — 1, then V’ is called a hyperplane in V. 


Example 8.11 A typical example of an affine subspace is the set V’ of solutions of 
the system of linear equations (1.3). If the coefficients a;; and constants b; of the 
system of equations (1.3) lie in the field K, then the set of solutions V’ is contained 
in the set of rows K” of length n, which we view as an affine space (IK”, IK”), that 
is, V =K" andL=K”. 

For a proof of the fact that the solution set V’ is an affine subspace, let us verify 
that its space of vectors L’ is the solution space of the homogeneous system of linear 
equations associated with (1.3). That the set of solutions of a linear homogeneous 
system is a vector subspace of K” was established in Sect. 3.1 (Example 3.8). Let 
the rows x and y be solutions of the system (1.3), viewed now as points of the affine 
space V = K”. We must verify that the vector xy defined as in the above example 
is contained in L’. But in accordance with this example, we must set xy =y-x, 
and it then remains for us to verify that the row y — x belongs to the subspace L’, 
that is, it is a solution of the homogeneous system associated with the system (1.3). 


8.2 Affine Spaces 295 


It suffices to verify this property separately for each equation. Let the ith equation 
of the linear homogeneous system associated with (1.3) be given in the form (1.10), 
that is, F;(x) = 0, where F; is some linear function. By assumption, x and y are 
solutions of the system (1.3), in particular, F;(x) = b; and F;(y) = b;. From this it 
follows that F;(y — x) = F;(y) — Fj(x) = b; — bj = 0, as asserted. 


Example 8.12 Let us now prove that conversely, every affine subspace of the affine 
space (IK”, IK”) is defined by linear equations, that is, if V’ is an affine subspace, 
then V’ coincides with the set of solutions of some system of linear equations. 
Since V’ is a subspace of the affine space (K”, IK”), it follows by definition that 
the corresponding set of vectors L’ is a subspace of the vector space K”. We saw in 
Sect. 3.1 (Example 3.8) that it is then defined in K” by a homogeneous system of 
linear equations 


Fi(x)=0,  Fy(x)=0, ...,. En (x) =0. (8.5) 


Let us consider an arbitrary point A € V’ and set F;(A) = 5; for alli=1,...,m. 
We shall prove that then the subspace V’ coincides with the set of solutions of the 
system 


F\(x) =), Fy(x) = bo, sey Fin(*) = bm. (8.6) 


Indeed, let us take an arbitrary point B € V’. Let the points A and B have coordi- 
nates A = (a 1,...,@,) and B = (fj,..., By) in some frame of reference. Then the 
coordinates of the vector AB are equal to (6; — aj,..., Bn — @n), and we know 
that the point B belongs to V’ if and only if the vector x = AB belongs to the sub- 
space L’, that is, satisfies equations (8.5). Now using the fact that the functions F; 
are linear, we obtain that for any one of them, 


F(B, — 04,.--, Bn — Un) = Fi(B1,.--, Bn) — Fi(Q1,---5 On) = Fi(B) — Bj. 


This implies that the point B belongs to the affine subspace V’ if and only if F;(B) = 
b;, that is, its coordinates satisfy equations (8.6). 


Definition 8.13 Affine subspaces V’ and V” are said to be parallel if they have the 
same set of vectors, that is, if L’ = L”. 


It is easy to see that two parallel subspaces either have no points in common or 
else coincide. Indeed, suppose that V’ and V” are parallel and the point A belongs 
to V’N V”. Since the spaces of vectors for V’ and V” coincide, it follows that for 
an arbitrary point B € V’, there exists a point C € V” such that AB=AC. Hence, 
taking into account the uniqueness of the point D in the relationship (8.1) from the 
definition of an affine space, it follows that B = C, which implies that V’ c V”. 
Since the definition of parallelism does not depend on the order of the subspaces V’ 
and V”, the opposite inclusion V” C V’ holds as well, which yields that V’ = V”. 


296 8 Affine Spaces 


Let V’ and V” be two parallel subspaces, and let us choose in each of them a 


point: A € V’ and B € V”. Setting the vector AB equal to a, we obtain, by definition 
of the translation 9, that 7,(A) = B. 
Let us consider an arbitrary point C € V’. It follows from the definition of par- 


allelism that there exists a point D € V” such that AC = BD. From this, it fol- 


lows easily that CD = AB = a; see Fig. 8.1 and Remark 8.2. But this implies that 
Ta(C) = D. In other words, Jq(V’) C V”. Similarly, we obtain that T_qg(V”) C V’, 
whence from properties 1, 2, and 3 of a translation, it follows that V” C Jq(V’). 
This implies that 7,(V’) = V”, that is, any two parallel subspaces can be mapped 
into each other by a translation. Conversely, it is easy to verify that affine subspaces 
V’ and J, (V’) are parallel for any choice of V’ and a. 

Let us consider two different points A and B of an affine space (V,L). Then 
the totality of all points C whose existence is established by condition (3) in the 
definition of affine space (with arbitrary scalars a) forms, as is easy to see, an affine 


subspace V’. The corresponding vector subspace L’ coincides with (AB). Therefore, 
L’, and hence also the affine space (V’, L’), is one-dimensional. It is called the line 
passing through the points A and B. 

The notion of a line is related to the general notion of affine subspace by the 
following result. 


Theorem 8.14 In order for a subset M of an affine space V defined over a field 
of characteristic different from 2 to be an affine subspace of V, it is necessary and 
sufficient that for every two points of M, the line passing through them be entirely 
contained in M. 


Proof The necessity of this condition is obvious. Let us prove its sufficiency. Let 
us choose an arbitrary point O € M. We need to prove that the set of vectors OA, 
where A runs over all possible points of the set M, forms a subspace L’ of the 
space of vectors L of the affine space (V,L). Then for any other point B € M, the 
vector AB — OB _ OA will lie in the subspace L’, whence (M, L’) will be an affine 
subspace of the space (V, L). 

That the product of an arbitrary vector OA and arbitrary scalar a lies in L’ derives 
from the condition that the line (OA) is contained in L’. Let us verify that the sum 
of two vectors a = OA and b= OB contained in L’ is also contained in L’. For this, 
we shall need the condition that we required on the set of points of a line only for 
a = 1/2 (in order for us to be able to apply this condition, we have assumed that 
the field IK over which the affine space V in question is defined is of characteristic 
different from 2). Let C be a point of the line passing through A and B such that 
AC = LAB. By definition, along with each pair of points A and B of the set M, the 
line passing through them also belongs to this set. Hence it follows in particular that 
we have C € M and oc € L’. Let us denote the vector oc by c; see Fig. 8.2. Then 
we have the equalities 


=S>lc ol —=> => ores oe —> 
b=OB=OA+AB=a+AB, c= OC=O0A+AC=a4+AC, 


8.2 Affine Spaces 297 


— 
Fig. 8.2 Vectors OA, OB, 
— 
and OC 
Fig. 8.3. Independent points A 
go 
Ag A : 
—o———_o— 
° 
Ay a 


— — 
and thus in our case, we have AB = b — a and AC = c — a, which implies c— a= 
5(b —a), that is, c= 5(a +b). Consequently, the vector a + b equals 2c, and since 
c is in L’, the vector a + b is also in L’. 


Now let Ap, Aj,..., Am be a collection of m+ 1 points in the affine space V. 
Let us consider the subspace 


be, > > > 
L' = (AoAq, AoA2,..., AoAm) 


of the space L. It does not depend on the choice of point Ag among the given points 
Ao, Al,.-.., Am, and we may write it, for example, in the form (..., Aj; A;,...) for 
all i and j, 0 <i, j <_m. The set V’ of all points B € V for which the vector 
AoB is in L’ forms an affine subspace whose space of vectors is L’. By definition, 
dim V’ < m, and moreover, dim V’ = m if and only if dimL’ = m, that is, the vectors 
AgA\, ApA2,...,A0Am are linearly independent. This provides the basis for the 
following definition. 


Definition 8.15 Points Ao, A1,..., Am of an affine space V for which 


. > > > 
dim(AgA1, AoA2, ames AoAm) =m 
are called independent. 


For example, the points Ap, A,,..., An (where n = dim V) determine a frame of 
reference if and only if they are independent. Two distinct points are independent, 
as are three noncollinear points, and so on. See Fig. 8.3. 

The following theorem gives an important property of affine spaces, connecting 
them with the familiar space of elementary geometry. 


Theorem 8.16 There is a unique line passing through every pair of distinct points 
A and B of an affine space V. 


298 8 Affine Spaces 


Proof Itis obvious that distinct points A and B are independent, and the line V’ Cc V 
—> —-> 
containing them must coincide with the set of points C € V for which AC ¢€ (AB) 
— — 
(instead of AC, one could consider the vector BC; it determines the same subspace 
—> —= —- —= => = : 
V’ CV). If AC =a@AB and AC’ = BAB, then CC’ = (6 — a) AB, whence it fol- 
lows that V’ is a line. 


Having selected on any line P of the affine space V the point O (reference point) 
and arbitrary point E € P not equal to O (scale of measurement), we obtain for an 
arbitrary point A € P the relationship 


— =— 
OA=a0OE, (8.7) 


where @ is some scalar, that is, an element of the field IK over which the affine space 
V under consideration is defined. The assignment A +> q, as is easily verified, es- 
tablishes a bijection between the points A € P and scalars a. This correspondence, 
of course, depends on the choice of points O and E on the line. In fact, we have here 
a special case of the notion of coordinates relative to a frame of reference (O; e) on 
the affine line P, where e = OE. 

As a result, we may associate with any three collinear points A, B, and C of an 
affine space, excepting only the case A = B = C, ascalar q, called the affine ratio of 
the points A, B, and C and denoted by (A, B, C). This is accomplished as follows. If 
A B, then a is uniquely determined by the relationship AC = aAB. In particular, 
a=lif B=C,anda=0if A=C.If A= BFC, then we take a = oo. And if all 
three points A, B, and C coincide, then their affine ratio (A, B, C) is undefined. 

Using the concept of oriented length of a line segment, we can write the affine 
ratio of three points using the following formula: 


Aes” (8.8) 
ee ee 


where AB denotes the signed length of AB, that is, AB = |AB| if the point A lies 
to the left of B, and AB = —|AB| if the point A lies to the right of B. Here, of 
course, in formula (8.8), we assume that a/0 = oo for every a £0. 

For the remainder of this section, we shall assume that V is a real affine space. 

In this case, obviously, the numbers a from relationship (8.7) corresponding to 
the points of the line P are real, and the relationship a < y < B between numbers 
on the real line carries over to the corresponding points of the line P Cc V. If these 
numbers a, 6, and y correspond to the points A, B, and C, then we say that the 
point C lies between the points A and B. 

Despite the fact that the relationship A +> @ defined by formula (8.7) itself de- 
pends on the choice of distinct points O and E on the line, the property of point C 
that it lie between A and B does not depend on that choice (although with a different 
choice of O and E, the order of the points A and B might, of course, change). In- 
deed, it is easy to verify that by replacing the point O by O’, to each of the numbers 


—_ 
a, B, and y is added one and the same term A corresponding to the vector OO’, and 


8.2 Affine Spaces 299 


in replacing the point E by E’, each of the numbers a, £, and y is multiplied by 


one and the same number jz 4 0 such that OE = ROE’. For both operations, the 
relationship a < y < # for the point C and pair of points A and B is unchanged, 
except that the numbers a@ and # in this inequality may exchange places (if they are 
multiplied by jz < 0). 

The property of a point C to lie between A and B is related to the affine ratio 
for three collinear points introduced above. Namely, it is obvious that in the case of 
a real space, the inequality (C, A, B) < 0 is satisfied if and only if the point C lies 
between A and B. 


Definition 8.17 The collection of all points on the line passing through the points 
A and B that lie between A and B together with A and B themselves is called the 
segment joining the points A and B and is denoted by [A, B]. Here the points A and 
B are called the endpoints of the segment, and by definition, they belong to it. 


Thus the segment is determined by two points A and B, but not by their order, 
that is, by definition [B, A] =[A, B]. 


Definition 8.18 A set M C V is said to be convex if for every pair of points A, B € 
M, the set M also contains the segment [A, B]. 


The notion of convexity is related to the partition of an affine space V by a 
hyperplane V’ into two half-spaces, in analogy with the partition of a vector space 
into two half-spaces constructed in Sect. 3.2. In order to define this partition, let 
us denote by L’ C L the hyperplane corresponding to V’, and let us consider the 
partition L\ L’ =L*t UL7 introduced earlier, choose an arbitrary point O’ € V’, and 
for a point A € V \ V’, state that A ¢ V* or A € V~ depending on the half-space 
(L* or L~) to which the vector OA belongs. 

A simple verification shows that the subsets V* and V~ thus obtained depend 
only on the half-spaces Lt and L~ and not on the choice of point O’ € V’. Obvi- 
ously, V\ V'=VtTUV~ and Vt NV- =@. 


Theorem 8.19 The sets Vt and V~ are convex, but the entire set V \ V' is not. 


Proof Let us begin by verifying the assertion about the set Vt. Let A, Be V*. 
— 


— 
This implies that the vectors x = O’A and y = O'B belong to the half-space LT, 
that is, they can be expressed in the form 


x=ae+u, y=fe+v, a, B>0,u,veU, (8.9) 


— 
for some fixed vector e ¢ L’. Let us consider the vector z = O/C and write it in the 
form 


z=yetw, wel. (8.10) 


300 8 Affine Spaces 


Assuming that the point C lies between A and B, let us prove that z € L™, that 
is, that y > 0. The given condition, that the point C lies between A and B, can 
be written with the help of an association between the points on the line passing 
through A and B and the numbers that are the coordinates in the frame of refer- 
ence (O; OE) according to formula (8.7). Although this association depends on the 
choice of points O and E, the property itself of “lying between,’ as we have seen, 
does not depend on this choice. Therefore, we may choose O = A and E = B. Then 
in our frame of reference, the point A has coordinate 0, and the point B has coor- 
dinate 1. Let C have coordinate 4. Since C € [A, B], it follows that 0 < A < 1. By 
definition, AC — nAB . But from the fact that 


Toa a ce cs Al 
AC=AO'+0C=2Z-x, AB=AO+O0O'B=y-x, 
we obtain the equality z — x = A(y — x), or equivalently, the equality 
Z=(1—-A)x+ Ay. 


Using formulas (8.9) and (8.10), we obtain from the last equality the relationship 
y = (1 —A)a + AB, from which, taking into account the inequalities a > 0, B > 0, 
and 0 < i <1, it follows that y > 0. 

The convexity of the set V~ is proved in exactly the same way. 

We shall prove, finally, that the set V \ V’ is not convex. In view of the convexity 
of Vt and V~, of interest to us is only the case in which the points A and B lie in 
different half-spaces, for example, A € V* and B € V~ (or conversely, A € V~ and 
B < V*, but this case is completely analogous). The condition A € Vt and B € V—~ 
means that in formulas (8.9), we have a > 0 and B < 0. In analogy to what has gone 
before, for an arbitrary point C € [A, B], let us construct the vector z as was done 
in (8.10), and thus obtain the equality y = (1 — A)a + AB. If the numbers a and 
B are of opposite sign, an elementary computation shows that there always exists 
a number 4 € [0, 1] such that (1 — A)a + AB = 0, and this yields that C € [A, B]. 
Thus the theorem is proved in its entirety. 


Thus the set V* is characterized by the property that every pair of its points are 
connected by a segment lying entirely within it. This holds as well for the set V~. At 
the same time, no two points A ¢ Vt and B € V~ can be joined by a segment that 
does not intersect the hyperplane V’. This consideration gives another definition of 
the partition V \ V’ = V* UV, one that does not appeal to vector spaces. 

Let us consider the sequence of subspaces 


VoCVCVW2C::-CV,=V, dimV; =i. (8.11) 


From the last condition, it follows that V;_; is a hyperplane in V;, and this implies 
that the partition defined by V; \ Vi-1 = vit UV," is the partition introduced above. 

A pair of half-spaces (Vi_1, V;) is said to be directed if it is indicated which of 
two convex subsets of the set V; \ Vi-1 we denote by a and which by V,. The 


8.3 Affine Transformations 301 


sequence of subspaces (8.11) is called a flag if each pair (V;_1, V;) is directed. We 
note that in a flag defined by the sequence (8.11), the subspace Vp has dimension 0, 
that is, it consists of a single point. This point is called the center of the flag. 


8.3 Affine Transformations 


Definition 8.20 An affine transformation of an affine space (V,L) into another 
affine space (V’, L’) is a pair of mappings 


f:VoV\, FLU, 
satisfying the following two conditions: 


(1) The mapping ¥ :L— L’ is a linear transformation of vector spaces L > L’. 
(2) For every pair of points A, B € V, we have the equality 


——_ — 
f(A) f(B) = F (AB). 


Condition (2) means that the linear transformation ¥ is determined by the map- 
ping f. It is called the linear part of the mapping f and is denoted by A(/f). In the 
sequel we shall, as a rule, indicate only the mapping f : V > V’, since the linear 
part F is uniquely determined by it, and we shall view the affine transformation as 
a mapping from V to V’. 


Theorem 8.21 Affine transformations possess the following properties: 


(a) The composition of two affine transformations f and g is again an affine trans- 
formation, which we denote by gf. Here A(gf) = A(g)A(f). 

(b) An affine transformation f is bijective if and only if the linear transformation 
A(f) is bijective. In this case, the inverse transformation f—' is also an affine 
transformation, and ACh") = Atty". 

(c) If f =e, the identity transformation, then A(f) = &. 


Proof All these assertions are proved by direct verification. 

(a) Let (V,L), (V’,U’), and (V”, L”) be affine spaces. Let us consider the affine 
transformation f : V > V’ with linear part ¥ = A(f) and another affine transfor- 
mation g : V’ > V” with linear part 9 = A(g). We shall denote the composition of 
f and g by h, and the composition of F and § by #. Then by the definition of the 
composition of arbitrary mappings of sets, we have h: V > V” and #:L—>L", 
and moreover, we know that # is a linear transformation. Thus we must show that 

a — 
every pair of points A, B € V satisfies the equality h(A)h(B) = #(AB). But since 
by definition, we have the equalities 


FAPB=F(AB), — 9(A)9(B) =9(4'B) 


302 8 Affine Spaces 
for arbitrary points A, B € V and A’, B’ € V’, it follows that 


Ss oO Oe" —= —= 
h(A)h(B) = g(f (A))a(f(B)) = 9( f(A) f (B)) = G(F (AB)) = H(AB). 


The proofs of assertions (b) and (c) are just as straightforward. 


Let us give some examples of affine transformations. 


Example 8.22 For affine spaces (L, L) and (L’, L’), a linear transformation f = F : 
L— U is affine, and moreover, it is obvious that A(f) =F. 


In the sequel, we shall frequently encounter affine transformations in which the 
affine spaces V and V’ coincide (and this also applies to the spaces of vectors L and 
L’). We shall call such an affine transformation of a space V an affine transformation 
of the space into itself. 


Example 8.23 A translation J, by an arbitrary vector a € L is an affine transfor- 
mation of the space V into itself. It follows from the definition of translation that 
A(Tq) = &. Conversely, every affine transformation whose linear part is equal to € 
is a translation. Indeed, by the definition of an affine transformation, the condition 


A(f) = & implies that f(A) f(B) = AB. Recalling Remark 8.2 and Fig. 8.1, we 
—_—_—> _ —_> 
see that from this assertion follows the equality Af(A) = Bf(B), which implies 


that f = Jq, where the vector a is equal to Af (A) for some (any) point A of the 
space V. 


The same reasoning allows us to obtain a more general result. 
Theorem 8.24 [f affine transformations f : V > V' and g: V > V' have identical 


linear parts, then they differ only by a translation, that is, there exists a vector a € U 
such that g = Tq f. 


Proof By definition, the equality A(f) = A(g) implies that f(A) f(B) = 
—_ 
g(A)g(B) for every pair of points A, B € V. From this, the equality 


F(A)g(A) = f(B)g(B) (8.12) 


clearly follows. As in Example 8.23, this reasoning is based on Remark 8.2. The 
relationship (8.12) implies that the vector f(A)g(A) does not depend on the choice 
of the point A. We shall denote this vector by a. Then by the definition of trans- 
lation, g(A) = Ta(f(A)) for every point A € V, which completes the proof of the 
theorem. 


Definition 8.25 Let V’ Cc V be a subspace of the affine space V. An affine trans- 
formation f : V — V’ is said to be a projection onto the subspace V’ if f(V) = V’ 
and the restriction of f to V’ is the identity transformation. 


8.3. Affine Transformations 303 


Fig. 8.4 Fibers of a -1 4 
projection Y f 
A y’ 


Theorem 8.26 If f : V > V’ is a projection onto the subspace V' C V, then the 
preimage f—'(A‘) of an arbitrary point A' € V' is an affine subspace of V of di- 
mension dim V — dim V’. For distinct points A’, A" € V’, the subspaces f~\(A’) 
and f—~'(A") are parallel. 


Proof Let ¥ = A(f). Then ¥ :L— LU’ is a linear transformation, where L and L’ 

are the respective spaces of vectors of the affine spaces V and V’. Let us consider 

an arbitrary point A’ € V’ and points P, Q € f—!(A’), that is, f(P) = f(Q) =A’. 
ee 

Then the vector f(P)f(Q) is equal to 0, whence by the definition of an affine 


transformation, we obtain that f(P) f (0) =F(P 0) = 0, that is, the vector PO is 
in the kernel of the linear transformation ¥ , which, as we know, is a subspace of L. 
Conversely, if P € f~!(A’) and the vector x is in the kernel of the transformation 


F, that is, F(x) = 0, then there exists a point Q € V for which x = PO. Then 
=> 
f(P) = f(Q) and Q € f~'(A’). By definition, an arbitrary vector x = A’B’ € L’ 
> 
can be represented in the form F (PQ), where f(P) = A’ and f(Q) = B’. This 


means that the image of the transformation ¥ coincides with the entire space L’, 
whence by Theorem 3.72, we obtain 


dim f—'(A’) = dim ¥~! (0) = dimL — dimL’ = dim V — dimV’, 


since F—~!(0) is the kernel of the transformation F , and the number dimL’ is equal 
to its rank; see Fig. 8.4. We have already proved that for every point A’ € V’, the 
space of vectors of the affine space f~!(A’) coincides with ¥ ~! (0). This completes 
the proof of the theorem. 


The subspaces f~!(A’) for the points A’ € V’ are called fibers of the projection 
f:V— V’; see Fig. 8.4. If S’ C V’ is some subset (not necessarily a subspace), 
then its preimage, the set S = f—!($’), is called a cylinder in V. 


Definition 8.27 An affine transformation f : V > V’ is called an isomorphism if it 
is a bijection. Affine spaces V and V’ in this case are said to be isomorphic. 


By assertion (b) of Theorem 8.21, the condition of a transformation f : V > V’ 
being a bijection is equivalent to the bijectivity of the linear transformation A(/) : 
L -—> L’ of the corresponding spaces of vectors L and L’. Thus affine spaces V and 
V’ are isomorphic if and only if the corresponding spaces of vectors L and L’ are 
isomorphic. As shown in Sect. 3.5, vector spaces L and L’ are isomorphic if and 


304 8 Affine Spaces 


only if dimL = dimL’, and in this situation every nonsingular linear transformation 
L-—> L’ is an isomorphism. This yields the following assertion: affine spaces V and 
V’ are isomorphic if and only if dim V = dim V’. Here every affine transformation 
f : V — V’ whose linear part A(f) is nonsingular is an isomorphism between V 
and V’. We shall frequently call an affine transformation f with nonsingular linear 
part A(f) nonsingular. 

From the definitions, we immediately obtain the following theorem. 


Theorem 8.28 The affine ratio (A, B,C) of three collinear points does not change 
under a nonsingular affine transformation. 


Proof By definition, the affine ratio a = (A, B,C) of three points A, B,C under 
the condition A # B is defined by the relationship 


—> => 
AC =aAB. (8.13) 


Let f : V > V be a nonsingular affine transformation and F : L — L its corre- 
sponding linear transformation. Then in view of the nondegeneracy of the transfor- 
mation f, we have f(A) 4 f(B) and 


—_————_> 


FASO=F(AC), F(A)f(B) =F (AB), 


and B = (f(A), f(B), f (C)) is defined by the equality f(A) f(C) = Bf (A) f(B). 
that is, 


F (AC) = BF (AB). (8.14) 


Applying the transformation ¥ to both sides of equality (8.13), we obtain F (AC )= 
aFf (AB), whence taking into account equality (8.14), it follows that 6 = a. In the 
case that A = B £C, we obtain, in view of the nonsingularity of f, the analo- 
gous relationship f(A) = f(B) #4 f(C), from which we have (A, B, C) = oo and 
(f(A), f(B), f(C)) = 00. 


Example 8.29 Every affine space (V,L) is isomorphic to the space (L, L). Indeed, 
let us choose in the set V an arbitrary point O and define the mapping f : V > Lin 
such a way that f(A) = OA. It is obvious, by the definition of affine space, that the 
mapping f is an isomorphism. 


Let us note that the situation here is similar to that of an isomorphism of a vector 
space L and the dual space L*. In one case, the isomorphism requires the choice of 
a basis of L, while in the other, it is the choice of a point O in V. 

Let f : V > V’ be an affine transformation of affine spaces (V, L) and (V’,L’). 
Let us consider isomorphisms g : V > L and g’: V’ > L’, defined, as in Exam- 
ple 8.29, by the selection of certain points O € V and O’ € V’. We have the map- 


8.3. Affine Transformations 305 


pings 


| le (8.15) 


L— > LU’ 
F 
where ¥ = A(f). Here, generally speaking, we cannot assert that Fy = g’ f, but 
nevertheless, these mappings are closely related. For an arbitrary point A € V, we 
; — = — 
have by construction that g(A) = OA and F (g(A)) = F(OA) = f(O) f(A). In 
——_> —_—_ > 


———_ 
just the same way, y’(f (A)) = O’ f(A). Finally, O’ f (A) = O' f(O) + f(O)f (A). 
Combining these relationships, we obtain 


ap 
gy f =TrFo, whereb=O'f(O). (8.16) 


Relationship (8.16) allows us to write down the action of affine transformations 
in coordinate form. To do so, we choose frames of reference (O; e1,...,@,) and 
(O',€,...,€m), Where n = dim V and m = dim V’, in the spaces V and V’. Then 
the coordinates of the point A in the chosen frame of reference are the coordinates of 
the vector OA = g(A) in the basis e1,..., e€,. Likewise, the coordinates of the point 
f (A) are the coordinates of the vector O' f(A) = g’(f(A)) in the basis e/,,..., €/,- 
Let us make use of relationship (8.16). Suppose the coordinates of the vector OA 


= 
in the basis e1,..., @, are (@1,..., @,), the coordinates of the vector O’ f(A) in the 
basis e/,...,€), are (a,...,@/,,), and the matrix of the linear transformation ¥ in 
these bases is F = (fj;). Setting the coordinates of the vector b from formula (8.16) 
in the basis e|,..., €/,, equal to (61, ..., Bm), we obtain 


n 
a= \° fijaj + Bj, i=1,...,m. (8.17) 
j=l 
Using the standard notation for column vectors 


at ar Bi 


An Qt}, Bn 
we may rewrite formula (8.17) in the form 
[«"] = Flo] +161. (8.18) 


The most frequent case that we shall encounter in the sequel is that of transfor- 
mations of an affine space V into itself. Let us assume that the mapping f : V > V 
has a fixed point O, that is, for the point O € V, we have f(O) = O. Then the trans- 
formation f can be identified with its linear part, that is, if by the choice of affine 


306 8 Affine Spaces 


space V, the frame of reference (O; e1,..., @,) with fixed point O identifies V with 
the vector space L, then the mapping f is identified with its linear part F = A(f). 
—_> = 
Here f(O) = O and Of (A) = F (OA) for every point A € V. 
We shall call such affine transformations of a space V into itself linear (we note 
that this notion depends on the choice of point O € V that f maps to itself). If for an 
arbitrary affine transformation f we define fo = 7 Jf, where the vector a is equal 


—_—> 
to Of (O), then fo will be a linear transformation, and we obtain the representation 


f= Ta fo. (8.19) 


It is obvious that a nonsingular affine transformation of the space (V, L) takes each 
frame of reference (O; e1,..., @,) into some other frame of reference. This implies 
that if f(O) = O' and A(f)(e;) =e’, then (O’; e),..., e/,) is also a frame of refer- 
ence. Conversely, if the transformation f takes some frame of reference to another 
frame of reference, then it is nonsingular. 

From the representation (8.19) we obtain the following result. 

If we are given a frame of reference (O; e1,...,@,), an arbitrary point O’, and 
vectors @1,...,@, in L, then there exists (and it is unique) an affine transformation 
f mapping O to O’ such that A(f)(e;) =a; for alli =1,...,n. To prove this, we 
set a equal to 00 in representation (8.19), and for fo, we take a linear transfor- 
mation of the vector space L into itself such that fo(e;) = a; for alli = 1,...,n. 
It is obvious that the affine transformation f thus constructed satisfies the requisite 
conditions. Its uniqueness follows from the representation (8.19) and from the fact 
that the vectors e;,...,é@, form a basis of L. 

The following reformulation of this statement is obvious: if we are given n + 1 
independent points Ao, Ai,..., A, of an n-dimensional affine space V and an ad- 
ditional arbitrary n + 1 points Bo, By, ..., Bn, then there exists (and it is unique) an 
affine transformation f : V — V such that f(A;) = B; for alli =0,1,...,n. 

In the sequel, it will be useful to know about the dependence of the vector a 
in representation (8.19) on the choice of point O (on its choice also depends the 
transformation fo of the space V, but as a transformation of a vector space L, it 


> 
coincides with A(f)). Let us set OO’ = c. Then for a new choice of O’ as fixed 
point, we have, similar to (8.19), the representation 


f=Ta fo, (8.20) 


rs 
where f5(O’) = O’ and the vector a’ is equal to O' f(O’). By well-known rules, we 
have 


> > — 
a’ = 0'f(0')=0'0+0f(0'), 


——_>—= —  —-_/_SS:? 
Of (0') = Of (0) + f(O)f(0') =a4+ Fe). 


8.3. Affine Transformations 307 


—> — 
Since O’O = —O(O’, we obtain that the vectors a and a’ in representations (8.19) 
and (8.20) are related by 


i / 
a=a+F(c)—c, wherec=OO. (8.21) 


Let us choose a frame of reference in the affine space (V, L). Let us recall that it 
is written in the form (O; e;,...,@,) or (O; A,..., An), where e; = OA,. Let f 
be a nonsingular transformation of V into itself, and let it map the frame of reference 
(O; e1,...,@n) to (O’,e;,...,¢,). Ife; = Ova, then this implies that f(O) = O’ 
and f(A;) = Aj fori=1,...,n. 

Let the point A € V have coordinates (a1, ..., @,) relative to the frame of refer- 
ence (O; Aj,..., An). This means that the vector OA is equal to a}e; +---+ nen. 

> — 

Then the point f(A) determines the vector f(O) f(A), that is, fF(OA). And this 
vector obviously has, in the basis @1: bees e|, , the same coordinates as the vector OA 
in the basis e;,..., @n, since by definition, e. = F (e;). Thus the affine transforma- 
tion f is defined by the fact that the point A is mapped to a different point f(A) 
having in the frame of reference (O’, > ere e) the same coordinates as the point 
A had in the frame of reference (O; e1,..., @y). 


Definition 8.30 Two subsets S and S’ of an affine space V are said to be affinely 
equivalent if there exists a nonsingular affine transformation f : V — V such that 


F(N=S'. 


The previous reasoning shows that this definition is equivalent to saying that in 
the space V, there exist two frames of reference (O; e1,...,@n) and (O'; e',,..., €),) 
such that all points of the set S have the same coordinates with respect to the first 
frame of reference as the points of the set S’ have with respect to the second. 

In the case of real affine spaces, the definition of affine transformations by for- 
mulas (8.17) and (8.18) makes it possible to apply to them Theorem 4.39 on proper 
and improper linear transformations. 


Definition 8.31 A nonsingular affine transformation of a real affine space V to itself 
is said to be proper if its linear part is a proper transformation of the vector space. 
Otherwise, it is called improper. 


Thus by this definition, we consider translations to be proper transformations. 
A bit later, we shall provide a more meaningful justification for this definition. 

By the given definition of affine transformation, whether f is proper or improper 
depends on the sign of the determinant of the matrix F = (fj;) in formulas (8.17), 
(8.18). We observe that this concept relates only to nonsingular transformations V, 
since in formulas (8.17) and (8.18), we must have m =n. 

In order to formulate an analogue to Theorem 4.39, we should refine the sense 
of the assertion about the fact that the family g(t) of affine transformations depends 


308 8 Affine Spaces 


continuously on the parameter t. By this, we shall understand that for g(t), in the 
formula 
n 
a! =o sij(Na; + Bit), i=1,...,n, (8.22) 


j=l 


analogous to (8.17), written in some (arbitrarily chosen) frame of reference of the 
space V, all coefficients 9;;(t) and 6; (t) depend continuously on ¢. In particular, if 
G(t) = (gj; (t)) is a matrix of the linear part of the affine transformation g(t), then 
its determinant |G(t)| is a continuous function. From the properties of continuous 
functions, it follows that the determinant |G(t)| has the same sign at all points of 
the interval [0, 1]. 

Thus we shall say that an affine transformation f is continuously deformable 
into h if there exists a family g(t) of continuous affine transformations, depending 
continuously on the parameter ¢ € [0, 1], such that g(0) = f and g(1) =A. It is 
obvious that the property thus defined of affine transformations being continuously 
deformable into each other defines on the set of such transformations an equivalence 
relation, that is, it satisfies the properties of reflexivity, symmetry, and transitivity. 


Theorem 8.32 Two nondegenerate affine transformations of a real affine space are 
continuously deformable into each other if and only if they are either both proper or 
both improper. In particular, a nonsingular affine transformation f is proper if and 
only if it is deformable into the identity. 


Proof Let us begin with the latter, more specific, assertion of the theorem. Let a 
nonsingular affine transformation f be continuously deformable into e. Then by 
symmetry, there exists a continuous family of nonsingular affine transformations 
g(t) with linear part A(g(t)) such that g(0) =e and g(1) = f. For the transfor- 
mation g(t), let us write (8.22) in some frame of reference (O; e1,...,@,) of the 
space V. It is obvious that for the matrix G(t) = (g;;(t)), we have the relation- 
ships G(0O) = E and G(1) = F, where F is the matrix of the linear transformation 
F = A(f) in the basis e),..., e, of the space L and 6;(0) = 0 for alli=1,...,n. 
By the definition of continuous deformation, the determinant |G(t)| is nonzero for 
all t € [0, 1]. Since |G(O)| = | E| = 1, it follows that |G(t)| > 0 for all ¢ € [0, 1], and 
in particular, for t = 1. And this means that |A(f)| = |G(1)| > 0. Thus the linear 
transformation A(f) is proper, and by definition, the affine transformation f is also 
proper. 

Conversely, let f be a proper affine transformation. This means that the linear 
transformation A(f) is proper. Then by Theorem 4.39, the transformation A(/) is 
continuously deformable into the identity. Let $(t) be a family of linear transfor- 
mations such that $(0) = & and 9(1) = A(f), given in some basis e1,..., @, of the 
space L by the formula 


n 
C= eee tala, (8.23) 
j=l 


8.4 Affine Euclidean Spaces and Motions 309 


where g;;(t) are continuous functions, the matrix G(t) = (g;;(¢)) is nonsingular for 
all t € [0, 1], and we have the equalities G(O) = EF, G(t) = F, where F is the matrix 


of the transformation A(/) in the same basis e1,..., @n. 
Let us consider the family g(t) of affine transformations given in the frame of 
reference (O; €1,...,@,) by the formula 


n 
a = \° gij(t)oj +fit, t=1,...,n, 
j=l 


in which the coefficients of g;;(¢) are taken from formula (8.23), while the coeffi- 
cients 6; are from formula (8.17) for the transformation f in the same frame of refer- 
ence (O; e€1,...,@,). Since (0) = € and (1) = A(f), it is obvious that g(0) = e 
and g(1) = f, and moreover, |G(t)| > 0 for all ¢ € [0, 1], that is, the transformation 
g(t) is nonsingular for all t € [0, 1]. 

From this it follows by transitivity that every pair of proper affine transformations 
are continuously deformable into each other. 

The case of improper affine transformations is handled completely analogously. 
It is necessary only to note that in all the arguments above, one must replace 
the identity transformation € by some fixed improper linear transformation of the 
space L. 


Theorem 8.32 shows that analogously to real vector spaces, in every real affine 
space there exist two orientations, from which we may select arbitrarily whichever 
one we wish. 


8.4 Affine Euclidean Spaces and Motions 


Definition 8.33 An affine space (V,L) is called an affine Euclidean space if the 
vector space L is a Euclidean space. 


This means that for every pair of vectors x, y € L there is defined a scalar product 
(x, y) satisfying the conditions enumerated in Sect. 7.1. In particular, (x, x) > 0 for 
all x € L and there is a definition of the length |x| = ./(x, x) of a vector x. Since 
every pair of points A, B € V defines a vector AB é L, it follows that one can 
associate with every pair of points A and B, the number 


—= 
r(A, B)=|AB|, 


called the distance between the points A and B in V. This notion of distance that 
we have introduced satisfies the conditions for a metric introduced on p. xvii: 


(1) r(A, B) > 0 for A¥ B and r(A, A) =0; 
(2) r(A, B) =r(B, A) for every pair of points A and B; 


310 8 Affine Spaces 


(3) for every three points A, B, and C, the triangle inequality is satisfied: 
r(A,C) <r(A, B)+r(B,C). (8.24) 


Properties (1) and (2) clearly follow from the properties of the scalar product. Let 
us prove inequality (8.24), a special case of which (for right triangles) was proved 
on p. 216. By definition, if AB =x and BC = y, then (8.24) is equivalent to the 
inequality 


Ix + y| < |x] + yl. (8.25) 


Since there are nonnegative numbers on the left- and right-hand sides of (8.25), we 
can square both sides and obtain an equivalent inequality, which we shall prove: 


2 
Ix + yl? <(Ixl+lyl). (8.26) 
Since 
Ix + yP=(et+y,x+y)=[x/? +2, y) + ly, 


then after multiplying out the right-hand side of (8.26), we can rewrite this inequality 
in the form 


|x|? + 20x, y) + yl? <x? + 21x] - Ly] + Ly. 


Subtracting like terms from the left- and right-hand sides, we arrive at the inequality 
(x,y) < |x| -lyl, 


which is the Cauchy—Schwarz inequality (7.6). 

Thus an affine Euclidean space is a metric space. 

In Sect. 8.1, we defined a frame of reference of an affine space as a point O in 
V and a basis e},..., @, in L. If our affine space (V,L) is a Euclidean space, and 
the basis e;,...,@, is orthonormal, then the frame of reference (O; e1,...,e@;) 1s 
also said to be orthonormal. We see that an orthonormal frame of reference can be 
associated with each point O € V. 


Definition 8.34 A mapping g: V — V of an affine Euclidean space V into itself is 
said to be a motion if it is an isometry of V as a metric space, that is, if it preserves 
distances between points. This means that for every pair of points A, B € V, the 
following equality holds: 


r(g(A), g(B)) =r(A, B). (8.27) 


Let us emphasize that in this definition, we are speaking about an arbitrary map- 
ping g: V — V, which in general, does not have to be an affine transformation. By 
the discussion presented on p. xxi, a mapping g: V —> V is a motion if its image 
g(V) = V also satisfies the condition (8.27) of preserving distances. 


8.4 Affine Euclidean Spaces and Motions 311 


Example 8.35 Let a be a vector in the vector space L corresponding to the affine 
space V. Then the translation Zq is a motion. Indeed, by the definition of a transla- 
tion, for every point A € V we have the equality 7,(A) = B, where AB =a. If for 
some other point C, we have an analogous equality 7,(C) = D, then CD =a. By 
condition (2) in the definition of an affine space, we have the equality AB a CD, 
from which, by Remark 8.2, it follows that AC = BD. This means that IAC = 
IBD, or equivalently, r(A, C) = r(Jq(A), Fa(C)), as asserted. 


Example 8.36 Let us assume that the mapping g : V — V has the fixed point O, 
that is, the point O € V satisfies the equality g(O) = O. As we saw in Sect. 8.3, the 
choice of point O determines a bijective mapping V — L, where L is the space of 
vectors of the affine space V. Here to a point A € V corresponds the vector OA eL. 

Thus the mapping g : V — V defines a mapping % : L— L such that (0) = 0. 
Let us emphasize that since we did not assume that the mapping g was an affine 
transformation, the mapping ¢, in general, is not a linear transformation of the 
space L. Now let us check that if % is a linear orthogonal transformation of the 
Euclidean space L, then g is a motion. 

> -— 

By definition, the transformation is defined by the condition (OA) = Og(A). 
We must prove that g is a motion, that is, that for all pairs of points A and B, we 
have 


|g(A)g(B)| = |AB|. (8.28) 


>_> > 
We have the equality AB = OB — OA, and we obtain that 


> > > > > 
8(A)g(B) = g(A)O + Og(B) = Og(B) — Og(A), 


and this vector, by the definition of the transformation , is equal to (OB) — 

§ (OA). In view of the fact that the transformation g is assumed to be linear, this 
. ~_—>lc > a ee ; 

vector is equal to $(O B — OA). But as we have seen, OB — OA = AB, and this 


means that 


SS oenEEREEREEREEE ee —_ 
g(A)g(B) = G(AB). 


From the orthogonality of the transformation § it follows that \9(AB y= |ABI. In 
combination with the previous relationships, this yields the required equality (8.28). 


The concept of motion is the most natural mathematical abstraction correspond- 
ing to the idea of the displacement of a solid body in space. We may apply to the 
analysis of this all of the results obtained in the preceding chapters, on the basis of 
the following fundamental assertion. 


Theorem 8.37 Every motion is an affine transformation. 


Proof Let f be a motion of the affine Euclidean space V. As a first step, let us 
——— 
choose in V an arbitrary point O and consider the vector a = Of (O) and mapping 


312 8 Affine Spaces 


g =JI_qf of the space V into itself. Here the product J_, f, as usual, denotes 

sequential application (composition) of the mappings f and T_,. Then O is a fixed 

point of the transformation g, that is, g(O0) = O. Indeed, g(O) = F_q(f(O)), and 
> 


by the definition of translation, the equality g(O) = O is equivalent to f(O)O = 
—_——_ 


—a, and this clearly follows from the fact that a= Of (O). 

We now observe that the product (that is, the sequential application, or compo- 
sition) of two motions g; and g2 is also a motion; the verification of this follows at 
once from the definition. Since we know that 7, is a motion (see Example 8.35), it 
follows that g is also a motion. We therefore obtain a representation of f in the form 
ft = Tag, where g is a motion and g(O) = O. Thus as we saw in Example 8.36, g 
defines a mapping ¢ of the space L into itself. The main part of the proof consists in 
verifying that g is a linear transformation. 

We shall base this verification on the following simple proposition. 


Lemma 8.38 Assume that we are given a mapping § of a vector space L into itself 
and a basis €\,..., €n of L. Let us set 9(e;) = €., i=1,...,n, and assume that for 
every vector 


X=ajey +-+-+Qnen, (8.29) 
its image 
G(x) = aye, +--+ + ane, (8.30) 
has the same a1,..., Qn. Then §, is a linear transformation. 


Proof We must verify two conditions that enter into the definition of a linear trans- 
formation: 


(a) Gx t+ y) =G(x) + Gy), 
(b) (ax) = aG(x), 


for all vectors x and y and scalar a. 
The verification of this is trivial. (a) Let the vectors x and y be given by x = 
aye; +---+Q,e, and y = Bye; +--- + B,e,. Then their sum is given by 


x+ y= (a1 + Bier +-+-+ nt Bren. 
On the other hand, by the condition of the lemma, we have 
Gx + y) = (a1 + Bide +-+- + nt Bude, 
= (aye) +--+ + ane),) + (Biel +-+-+ Bne),) = G(x) + GY). 
(b) For the vector x = a,e; +---+a@,e, and an arbitrary scalar a, we have 
ax = (aa,)e; +---+ (Ady ey. 


By the condition of the lemma, 


G(ax) = (aor ey +--+ + (Wan )el, = (aye +--+ + onel,) = 0G (x). 


8.4 Affine Euclidean Spaces and Motions 313 


We now return to the proof of Theorem 8.37. Let us verify that the above con- 
struction of the mapping Y : L — L satisfies the condition of the lemma. To this end, 
let us first ascertain that it preserves the inner product in L, that is, that for all vectors 
x, y €L, we have the equality 


(G(x), $(y)) = (x, y). (8.31) 


Let us recall that the property for the transformation g to be a motion can be 
formulated as the following condition on a transformation § of a vector space L: 


|8@) — 90)| = lx — 9! (8.32) 
for all pairs of vectors x and y. Squaring both sides of equality (8.32), we obtain 
2 
G(x) — GO| = [x — yP. (8.33) 
Since x and y are vectors in the Euclidean space L, we have 
ln — yl? = |x]? — 2, y) +19, 
2 2 2 
|9G-) — G(y)|" = |G)" — 2(G.), CD) + [GOD]. 


Putting these expressions into equality (8.33), we find that 


|g(e)|? — 2(G.(x), GO) + |G)" = Ix? — 20, y) + Ly?. (8.34) 


Setting the vector y equal to 0 in relationship (8.34), and taking into account that 
g,(0) = 0, we obtain the equality |9(x)| = |x| for all x € L. Finally, taking into 
account the relationships |%(x)| = |x| and |%(y)| = |y|, from (8.34) follows the 
required equality (8.31). 

Thus for any orthonormal basis e1, ..., @n, the vectors e\. ...,é),, defined by the 
relationships 3(e;) = é; also form an orthonormal basis, in which the coordinates 
of the vector x = x1e; +---+ Xe, are given by the formula x; = (x, e;). From this 
we obtain that (G(x), e;) = x;, and this implies that 


G(x) = xe) +---+%xne),, 


that is, the constructed mapping 9: L — L satisfies the condition of the lemma. 
From this it follows that % is a linear transformation of the space L, and by property 
(8.31), it is an orthogonal transformation. 


Let us note that along the way, we have proved the possibility of expressing an 
arbitrary motion f in the form of the product 


f=Tag, (8.35) 


where J, is a translation, and g has a fixed point O and corresponds to some orthog- 
onal transformation % of the space L (see Example 8.36). From the representation 
(8.35) and results of Sect. 8.3, it follows that two orthonormal frames of reference 
can be mapped into each other by a motion, and moreover, it is unique. 


314 8 Affine Spaces 


For studying motions, we may make use of the structure of orthogonal transfor- 
mations already investigated in Sect. 7.2, that is, Theorem 7.27. By this theorem, for 
every orthogonal transformation, in particular, for the transformation % associated 
with the motion g in formula (8.35), there exists an orthonormal basis in which the 
matrix of the transformation G is in block-diagonal form: 


1 
0 
1 
—l 
; (8.36) 
—l 
Go, 
0 
Go, 
where 
Cn. cS ei ) (8.37) 
SING; COSY; 


and gj 4 ak, k € Z. Two instances of the number —1 on the main diagonal of the 
matrix (8.36) can be substituted by the matrix Gy of the form (8.37) with g =z, 
so that is possible to assume that in the matrix (8.36), the number —1 is absent 
or is encountered exactly one time, and in this case, 0 < gj < 2m. Under such a 
convention, we obtain that if the transformation % is proper, then the number —1 
does not appear on the main diagonal, while if % is improper, there is exactly one 
such occurrence. 

From the aforesaid, it follows that in the case of a proper transformation § of the 
space L of dimension n, we have the orthogonal decomposition 


L=lo®Li@---@lLx, whereL; 1 L; for alli ¥ j, (8.38) 


where all subspaces Lo,...,L, are invariant with respect to the transformation %, 
and dimLo = n — 2k, dimL; = 2 for all i = 1,...,k. The restriction of % to Lo 
is the identity transformation, while the restriction of % to the subspace L; with 
i=1,...,k is a rotation through the angle ¢;. 

But if the transformation § is improper, then on the main diagonal of the ma- 
trix (8.36) the number —1 is encountered once. Then in the orthogonal decomposi- 
tion (8.38), there is added one additional one-dimensional term Lx, in which the 
transformation g takes each vector x to the opposite vector —x. The orthogonal 
decomposition of the space L into a sum of subspaces invariant with respect to the 
transformation § takes the form 


L=lo@®lLi@---@lLe@lLe+1, whereL; LL; for alli Fj, (8.39) 


where dimL; = 2 fori = 1,...,k, dimLo =n — 2k — 1, and dimLy4; = 1. 


8.4 Affine Euclidean Spaces and Motions 315 


Now we shall make use of the arbitrariness in the selection of O in the represen- 
tation (8.35) of the motion f. By formula (8.21), for a change in the point O, the 
vector @ in (8.35) is replaced by the vector a + $(c) — c, where for c, one can take 
an arbitrary vector of the space L. We have the representation 


c=coteyt:::ter,, ci El, (8.40) 
in the case of the decomposition (8.38), or else we have 
CHegote, tes +ep ters, C7 Eli, (8.41) 


in the case of the decomposition (8.39). 

Since 9(x) =x for every vector x € Lo, the term cg makes no contribution to 
the vector ¥(c) — ¢ added to a. For i > 0, the situation is precisely the reverse: 
the transformation % — & defines a nonsingular transformation in L;. This follows 
from the fact that the kernel of the transformation 9 — & is equal to (0), which is 
obvious for a rotation through the angle g;, 0 < g; < 27, in the plane and for the 
transformation —€ on a line. Therefore, the image of the transformation % — € in 
L; is equal to the entire subspace L; for i > 0. That is, every vector a; € Lj; can be 
represented in the form a; = %(c;) — ¢;, where c; is some other vector of the same 
space L;, i > 0. 

Thus in accordance with the representations (8.40) and (8.41), the vector a can 
be written in the form a =ajp +a; +---+ ay Ora =agp+a,+---+agt+ apy, 
depending on whether the transformation ¢ is proper or improper. We may set a; = 
G,(c;) — ¢;, where the vectors c; are defined respectively by relationship (8.40) or 
(8.41). As a result, we obtain the equality 


a+ G(c) —c=ao, 


meaning that by our selection of the point O, we can obtain that the vector a is 
contained in the subspace Lo. 
We have thus proved the following theorem. 


Theorem 8.39 Every motion f of an affine Euclidean space V can be represented 
in the form 
f =Ta8, (8.42) 


where the transformation g has fixed point O and corresponds to the orthogo- 
nal transformation % = A(g), while Tq is a translation by the vector a such that 


G(a) =a. 


Let us consider the most visual example, that of the “physical” three-dimensional 
space in which we live. Here there are two possible cases. 


Case 1: The motion f is proper. Then the orthogonal transformation g : L — L is 
also proper. Since dimL = 3, the decomposition (8.38) has the form 


L=lo lL, L; LL;, 


316 8 Affine Spaces 


Fig. 8.5 A proper motion 


where dimLg = | and dimL, = 2. The transformation % leaves vectors in Lo fixed 
and defines a rotation through the angle 0 < g < 277 in the plane L;. Representation 
(8.42) shows that the transformation f can be obtained as a rotation through the 
angle g about the line Lo and a translation in the direction of Lg; see Fig. 8.5. 

This result can be given a different formulation. Suppose a solid body executes an 
arbitrarily complex motion over time. Then its initial position can be superimposed 
on its final position by a rotation around some axis and a translation along that 
axis. Indeed, since it is a solid body, its final position is obtained from the initial 
position by some motion /. Since this change in position is obtained as a continuous 
motion, it follows that it is proper. Thus we may employ the three-dimensional case 
of Theorem 8.39. This result is known as Euler’s theorem. 


Case 2: The motion f is improper. Then the orthogonal transformation % : L — L is 
also improper. Since dim L = 3, the decomposition (8.39) has the form 


L=Lo @Li Oly, L; LL;, 


where Lo = (0), dimL; = 2, and dimL2 = 1. The transformation %, defines a rotation 
through the angle 0 < g < 27 in the plane L, and carries each vector on the line L2 
into its opposite. From this it follows that the equality %(a) =a holds only for 
the vector a = 0, and therefore, the translation 7, in formula (8.42) is equal to the 
identity transformation. Therefore, the motion f always has the fixed point O, and 
can be obtained as a rotation through the angle 0 < g < 27 in the plane L; passing 
through this point followed by a reflection in the plane L,. 


The theory of motions in an affine Euclidean space can be given a more graphical 
form if we employ the notion of flags, which was introduced in Sect. 8.2 (p. 300). 
First, it is clear that a motion of a space carries a flag to a flag. The main result, 
which we in fact have already proved, can be formulated as follows. 


Theorem 8.40 For every pair of flags, there exists a motion taking the first flag to 
the second, and such a motion is unique. 


Proof To prove the theorem, we observe that for an arbitrary flag 


Voc Vy, c::-CV,=YV, (8.43) 


8.4 Affine Euclidean Spaces and Motions 317 


the affine subspace Vo consists by definition of a single point. Setting Vo = O, we 
may identify each subspace V; with the subspace L; C L, where L; is the space of 
vectors of the affine space V;. Here the sequence 


lo CL Ce CL, =L (8.44) 


defines a flag in L. On the other hand, we saw in Sect. 7.2 that the flag (8.44) 
is uniquely associated with an orthonormal basis e1,...,e, in L. Thus L; = 
(e1,...,@;) ande; € sa as established in Sect. 7.2. This means that the flag (8.43) is 
uniquely determined by some orthonormal frame of reference (O; e1,...,@n) in V. 
As we noted above, for two orthonormal frames of reference, there exists a unique 
motion of the space V taking the first frame of reference to the second. This holds, 
then, for two flags of the form (8.43), which proves the assertion of the theorem. 


The property proved in Theorem 8.40 is called “free mobility” of an affine Eu- 
clidean space. In the case of three-dimensional space, this assertion is a mathemati- 
cal expression of the fact that in space, a solid body can be arbitrarily translated and 
rotated. 

In an affine Euclidean space, the distance r(A, B) between any two points does 
not change under a motion of the space. In a general affine space it is impossible to 
associate with each pair of points a number that would be invariant under every non- 
singular affine transformation. This follows from the fact that for an arbitrary pair of 
points A, B and another arbitrary pair A’, B’, there exists an affine transformation 
f taking A to A’ and B to B’. 

To prove this, let us write down a transformation f according to formula (8.19) 
in the form f = Jaq fo, choosing the point A as the point O. Here A is a fixed point 
of the affine transformation fo, that is, fo(A) = A. The transformation fo is defined 
by some linear transformation of the space of vectors L of our affine space V and is 
uniquely defined by the relation 

os pene 


Afo(C)=F (AC), CevV. 


— 
Then the condition f(A) = A’ will be satisfied if we set a = AA’. It remains to 
select a linear transformation F :L— L so as to satisfy the equality f(B) = B’, 
that is, Tq fo(B) = B’, which is equivalent to the relationship 


fo(B) = T_a(B’). (8.45) 


We set the vector x equal to AB (under the condition A #4 B, whence x 4 0) and 
consider the point P = T_q(B’) and vector y = AP. Then the relationship (8.45) is 
equivalent to the equality ¥ (x) = y. It remains only to find a linear transformation 
F :L— L for which the condition ¥ (x) = y is satisfied for given vectors x and y, 
with x 4 0. For this, we must extend the vector x to a basis of the space L and define 
F in terms of the vectors of this basis arbitrarily, provided only that the condition 
F (x) = y is satisfied. 


Chapter 9 
Projective Spaces 


9.1 Definition of a Projective Space 


In plane geometry, points and lines in the plane play very similar roles. In order to 
emphasize this symmetry, the fundamental property that connects points and lines 
in the plane is called incidence, and the fact that a point A lies on a line / or that 
a line / passes through a point A expresses in a symmetric form that A and / are 
incident. Then one might hope that to each assertion of geometry about incidence 
of points and lines there would correspond another assertion obtained from the first 
by everywhere interchanging the words “point” and “line.” And such is indeed the 
case, with some exceptions. For example, to every pair of distinct points, there is 
incident one and only one line. But it is not true that to every pair of distinct lines, 
there is incident one and only one point: the exception is the case that the lines are 
parallel. Then not a single point is incident to the two lines. 

Projective geometry gives us the possibility of eliminating such exceptions by 
adding to the plane certain points called points at infinity. For example, if we do 
this, then two parallel lines will be incident at some point at infinity. And indeed, 
with a naive perception of the external world, we “see” that parallel lines moving 
away from us converge and intersect at a point on the “horizon.” Strictly speaking, 
the “horizon” is the totality of all points at infinity by which we extend the plane. 

In analyzing this example, we may say that a point p of the plane seen by us 
corresponds to the point where the line passing through p and the center of our 
eye meets the retina. Mathematically, this situation is described using the notion of 
central projection. 

Let us assume that the plane /7 that we are investigating is contained in three- 
dimensional space. Let us choose in this same space some point O not contained 
in the plane [7T. Every point A of the plane JT can be joined to O by the line OA. 
Conversely, a line passing through the point O intersects the plane [7 in a certain 
point, provided that the line is not parallel to IT. Thus most straight lines passing 
through the point O correspond to points A € /7. But lines parallel to IT intuitively 
correspond precisely to points at infinity of the plane JT, or “points on the horizon.” 
See Fig. 9.1. 


LR. Shafarevich, A.O. Remizov, Linear Algebra and Geometry, 319 
DOI 10.1007/978-3-642-30994-6_9, © Springer-Verlag Berlin Heidelberg 2013 


320 9  Projective Spaces 


Fig. 9.1 Central projection 


We shall make this notion the basis of the definition of projective space and shall 
develop it in more detail in the sequel. 


Definition 9.1 Let L be a vector space of finite dimension. The collection of all 
lines (x), where x is a nonnull vector of the space L, is called a projectivization of 
L or projective space P(L). Here the lines (x) themselves are called points of the 
projective space P(L). The dimension of the space P(L) is defined as the number 
dim P(L) = dimL — 1. 


As we saw in Chap. 3, all vector spaces of a given dimension n are isomorphic. 
This fact is expressed by saying that there exists only one theory of n-dimensional 
vector spaces. In the same sense, there exists only one theory of n-dimensional 
projective space. 

We shall frequently denote the projective space of dimension n by P” if we have 
no need of indicating the (n + 1)-dimensional vector space on the basis of which it 
was constructed. 

If dim P(L) = 1, then P(L) is called the projective line, and if dim P(L) = 2, then 
it called the projective plane. Lines in an ordinary plane are points on the projective 
line, while lines in three-dimensional space are points in the projective plane. 

And as earlier, we give the reader the choice whether to consider L a real or 
complex space, or even to consider it as a space over an arbitrary field K (with 
the exception of certain questions related specifically to real spaces). In accordance 
with the definition given above, we shall say that dimP(L) = —1 if dimL = 0. In 
this case, the set P(L) is empty. 

In order to introduce coordinates in a space P(L) of dimension n, we choose a 
basis e9, €1,...,@n in the space L. A point A € P(L) is by definition a line (x), 
where x is some nonnull vector in L. Thus we have the representation 


xXx =apeop t+ajyey +--+ Qnen. (9.1) 


The numbers (a0, @1,...,@,) are called homogeneous coordinates of the point A. 
But the point A is the entire line (x). It can also be obtained in the form (y) if 
y =x and’ 40. Then 


y=hapeo tAajyey +--+ + Aden. 


9.1 Definition of a Projective Space 321 


From this it follows that the numbers (Aq, Aa@1,...,A@,) are also homogeneous 
coordinates of the point A. That is, homogeneous coordinates are defined only up to 
a common nonzero factor. Since by definition, A = (x) and x 4 0, they cannot all be 
simultaneously equal to zero. In order to emphasize that homogeneous coordinates 
are defined only up to a nonzero common factor, they are written in the form 


(QQ 21: 022--+2 Ay). (9.2) 


Thus if we wish to express some property of the point A in terms of its homogeneous 
coordinates, then that assertion must continue to hold if all the homogeneous coor- 
dinates (ag, @1,..., @,) are simultaneously multiplied by the same nonzero number. 

Let us assume, for example, that we are considering the points of projective space 
whose homogeneous coordinates satisfy the relationship 


F (a9, 01, ..., Qn) = 0, (9.3) 


where F is a polynomial in n + | variables. In order for this requirement actu- 
ally to be related to the points and not depend on the factor 7 by which we can 
multiply their homogeneous coordinates, it is necessary that along with the num- 
bers (a, @1,..-.,@,), the relationship (9.3) be satisfied as well by the numbers 
(Aap, A@1,...,A@,) for an arbitrary nonzero factor 2. 

Let us elucidate when this requirement is satisfied. To this end, in the polynomial 
F (xo, X1,---,Xn) let us collect all terms of the form axe h xk with ko + ky + 
--- +k, =m and denote their sum by F;,,. We thereby obtain the representation 


N 
F(x0,X1,-+-5Xn) = be ce peeere a) 


m=0 


It follows at once from the definition of F;, that 
Fin (Axo, AX1, “ony AXn) = X” Fin (Xo, X1, eg Xn). 


From this, we obtain 


N 
F (Axo, AX1, eee »AXn) = Ps A” Fin (X05 X15 need Onde 


m=0 


Our condition means that the equality ae A” Fin = 0 is satisfied for the coordi- 
nates of the points in question and simultaneously for all nonzero values of 4. Let 
us denote by c,, the value Fy, (ao, 1, ..., @n) for some concrete choice of homoge- 
neous coordinates (a, @1,...,@,). Then we arrive at the condition ss Cyan? = 
0 for all nonzero values 7. This means that the polynomial ys CmA™ in the vari- 
able 4 has an infinite number of roots (for simplicity, we are now assuming that the 
field IK over which the vector space L is being considered is infinite; however, it 
would be possible to eliminate this restriction). Then, by a well-known theorem on 


322 9  Projective Spaces 


polynomials, all the coefficients c», are equal to zero. In other words, our equality 
(9.3) is reduced to the satisfaction of the relationship 


Fin(Qo, @1,---,A)=0, m=0,1,...,N. (9.4) 


The polynomial F,, contains only monomials of the same degree m, that is, it is 
homogeneous. We see that the property of the point A expressed by an algebraic re- 
lationship between its homogeneous coordinates does not depend on the permissible 
selection of coordinates but only on the point A itself if it is expressed by setting the 
homogeneous polynomials in its coordinates equal to zero. 

If L’ C Lis a vector subspace, then P(L’) C P(L), since every line (x) contained in 
L’ is also contained in L. Such subsets P(L’) C P(L) are called projective subspaces 
of the space P(L). Every P(L’) is by definition itself a projective space. Its dimension 
is thus defined by dim P(L’) = dimL’ — 1. By analogy with vector spaces, a projec- 
tive subspace P(L’) C P(L) is called a hyperplane if dim P(L’) = dim P(L) — 1, that 
is, if dimL’ = dimL — 1, and consequently, L’ is a hyperplane in L. 

A set of points of the space P(L) defined by the relationships 


Fi (a0, @1,...,Qn) =0, 


Fy(a0, Q1,...,@n) = 0, (9.5) 
Fin(@o, a1, Seite , An) = 0, 
where F), F2,..., Fm are homogeneous polynomials of differing (in general) de- 


grees, is called a projective algebraic variety. 


Example 9.2 The simplest example of a projective algebraic variety is a projec- 
tive subspace. Indeed, as we saw in Sect. 3.7, every vector subspace L’ C L can 
be defined with the aid of a system of linear homogeneous equations, and conse- 
quently, a projective subspace P(L’) c P(L) can be defined by formula (9.5), in 
which m = dim P(L) — dim P(L’) and the degree of each of the homogeneous poly- 
nomials Fj,..., Fi, is equal to 1. Here in the case m = 1, we obtain a hyperplane. 


Example 9.3 Another important example of a projective algebraic variety is what 
are called projective quadrics. They are given by formula (9.5), where m = | and 
the degree of the sole homogeneous polynomial F is equal to 2. We shall consider 
quadrics in detail in Chap. 11. The simplest examples of projective quadrics appear 
in a course in analytic geometry, namely curves of degree 2 in the projective plane. 


Example 9.4 Let us consider the set of points of the projective space P(L) whose 
ith homogeneous coordinate (in some basis eo, €1,..., @n Of the space L) is equal to 
zero, and let us denote by L; the set of vectors of the space L associated with these 
points. The subset L; C L is defined in L by a single linear equation a; = 0, and 
therefore is a hyperplane. This means that P(L;) is a hyperplane in the projective 
space P(L). We shall denote the set of points of the projective space P(L) whose 


9.1 Definition of a Projective Space 323 


Fig. 9.2 Affine subset of a 
projective space 


ith homogeneous coordinate is nonzero by V;. It is obvious that V; is already not a 
projective subspace in P(L). 


The following construction is a natural generalization of Example 9.4. In the 
space L let an arbitrary basis eo, e€1,..., @, be chosen. Let us consider some linear 
function g on the space L not identically equal to zero. Vectors x € L for which 
y(x) = 0 form a hyperplane Ly C L. It is a subspace of the solutions of the “system” 
consisting of a single linear homogeneous equation. To it is associated the projec- 
tive hyperplane P(Ly) C P(L). It is obvious that Ly coincides with the hyperplane 
L; from Example 9.4 if the linear function g maps each vector x € L onto its ith 
coordinate, that is, g is the ith vector of the basis of the space L*, the dual of the 
basis €0, €1,..., @n of the space L. 

Let us now denote by Wg the set of vectors x € L for which g(x) = 1. This is 
again the set of solutions of the “system” consisting of a single linear equation, but 
now inhomogeneous. It can be viewed naturally as an affine space with space of 
vectors Ly. Let us denote the set P(L) \ P(Ly) by Vy. Then for every point A € Vy 
there exists a unique vector x € Wg for which A = (x). 

In this way, we may identify the set Vy with the set Wg, and with the aid of this 
identification, consider Vy, an affine space. By definition, its space of vectors is Ly, 
and if A and B are two points in Vg, then there exist two vectors x and y for which 
g(x) = 1 and g(y) = 1 such that A = (x) and B = (y), and then AB =y-x. 
Thus the n-dimensional projective space P(L) can be represented as the union of 
the n-dimensional affine space Vy and the projective hyperplane P(L,) C P(L); see 
Fig. 9.2. In the sequel, we shall call V, an affine subset of the space P(L). 

Let us choose in the space L a basis €9,..., @, such that p(e9) = 1 and g(e;) = 0 
for all i = 1,...,”. Then the vector eo is associated with the point O = (e9) be- 
longing to the affine subset Vz, while all the remaining vectors e1,...,@, are in 
Ly, and they are associated with the points (e;),..., (én) lying in the hyperplane 
P(Ly). We have thus constructed in the affine space (Vy, Ly) a frame of reference 
(O; e€1,...,@n). The coordinates (&,...,&)) of the point A € Vy with respect to 
this frame of reference are called inhomogeneous coordinates of the point A in our 
projective space. We wish to emphasize that they are defined only for points in 
the affine subset Vy. If we return to the definitions, then we see that the inhomo- 
geneous coordinates (&,...,&,) are obtained from the homogeneous coordinates 


324 9  Projective Spaces 


(9.2) through the formula 


Qj ‘ 
&=—, i=l,...,n. (9.6) 
a0 
It is obvious here that for x from formula (9.1), the function g@ that we have chosen 
assumes the value g(x) = ao. 

In order to extend the concept of inhomogeneous coordinates to all points of 
a projective space P(L) = Vy U P(Lg), it remains also to consider the points of 
the projective hyperplane P(Ly). For such points it is natural to assign the value 
ay = 0. Sometimes this is expressed by saying that the inhomogeneous coordinates 
(&1,...,&)) of the point A € P(Lg) assume infinite values, which justifies thinking 
of P(L,) as a set of “points at infinity” (horizon) for the affine subset Vy. 

Of course, one could also choose a linear function g such that y(e;) = | for 
some number i € {0,...,}, not necessarily equal to 0, as was done above, and 
y(e;) =0 for all j Ai. We will denote the associated spaces Vy and Ly by V; and 
L;. In this case, the projective space P(L) can be represented in the analogous form 
V; U P(L;), that is, as the union of an affine part V; and a hyperplane P(L;) for 
the corresponding value i € {0,...,}. Sometimes this fact is expressed by saying 
that in the projective space P(L), one may introduce various affine charts. It is not 
difficult to see that every point A of a projective space P(L) is “finite” for some value 
i €{0,..., 7}, that is, it belongs to the subset V; for the corresponding value i. This 
follows from the fact that by definition, homogeneous coordinates (9.2) of the point 
A are not simultaneously equal to zero. If a; 4 0 for some i € {0,..., m}, then A is 
contained in the associated affine subset V;. 

If L’ and L” are two subspaces of a space L, then it is obvious that 


PLYAPL =P Ae"). (9.7) 


It is somewhat more complicated to interpret the set P(L’ + L”). It is obvious that 
it does not coincide with P(L’) U P(L”). For example, if L’ and L” are two distinct 
lines in the plane L, then the set P(L’) U P(L”) consisting of two points is in general 
not a projective subspace of the space P(L). 

To give a geometric interpretation to the sets P(L’ + L”), we shall introduce the 
following notion. Let P = (e) and P’ = (e’) be two distinct points of the projec- 
tive space P(L). Let us set L; = (e, e’) and consider the one-dimensional projective 
subspace P(L). It obviously contains both points P and P’, and moreover, it is 
contained in every projective subspace containing the points P and P’. Indeed, if 
Lo C L is a vector subspace such that P(L2) contains the points P and P’, then this 
means that Ly contains the vectors e and e’, which implies that it also contains the 
entire subspace L; = (e, e’). Therefore, by the definition of a projective subspace, 
we have that P(L,) C P(L2). 


Definition 9.5 The one-dimensional projective subspace P(L;) constructed from 
two given points P 4 P’ is called the line connecting the points P and P’. 


9.1 Definition of a Projective Space 325 


Theorem 9.6 Let L’ and L” be two subspaces of a vector space L. Then the union 
of lines connecting all possible points of P(L’) with all possible points of P(L") 
coincides with the projective subspace P(L’ +L”). 


Proof We shall denote by »' the union of lines described in the statement of the 
theorem. Every such line has the form P(L,), where L; = (e’, e””), for vectors e’ € L’ 
and e” €L”. Since e’ + e” EL’ +L”, it follows from the preceding discussion that 
every such line P(L;) belongs to P(L’ + L’). Thus we have proved the set inclusion 
TcPU+L”. 

Conversely, suppose now that the point S € P(L) belongs to the projective sub- 
space P(L’ + L”). This means that S = (e), where the vector e is in L’ + L’”. And 
this implies that the vector e can be represented in the form e = e’ + e”, where 
e’ €L’ and e” € L”. This means that S = (e) and the vector e belongs to the plane 
(e’,e”’), that is, S lies on the line connecting the point (e’) in P(L’) to the point (e’”) 
in P(L”). In other words, we have S € X, and thus the subspace P(L’ + L”) is con- 
tained in X’. Taking into account the reverse inclusion proved above, we obtain the 
required equality © = P(L’ +L”). 


Definition 9.7 The set P(L’ +L”) is called a projective cover of the set P(L’) UP(L”) 
and is denoted by 


P(U +L”) = P(L’) UP(L”). (9.8) 
Recalling Theorem 3.41, we obtain the following result. 


Theorem 9.8 If P’ and P” are two projective subspaces of a projective space P(L), 
then 


dim(P’ NP”) + dim(P’ UP”) = dimP’ + dimP”. (9.9) 
Example 9.9 If P’ and P” are two lines in the projective plane P(L), dim L = 3, then 


dim P’ = dim P” = 1 and dim(P’ U P”) < 2, and from relationship (9.9), we obtain 
that dim(P’ NP”) = 0, that is, every pair of lines in the projective plane intersect. 


The theory of projective spaces exhibits a beautiful symmetry, which goes under 
the name duality (we have already encountered an analogous phenomenon in the 
theory of vector spaces; see Sect. 3.7). 

Let L* be the dual space to L. The projective space P(L*) is called the dual of 
P(L). Every point of the dual space P(L*) is by definition a line (f), where f is 
a linear function on the space L not identically zero. Such a function determines a 
hyperplane Lr C L, given by the linear homogeneous equation f(x) = 0 in the vec- 
tor space L, which means that the hyperplane P ¢ is equal to P(Ly) in the projective 
space P(L). 

Let us prove that the correspondence constructed above between points ( f) of the 
dual space P(L*) and hyperplanes P ¢ of the space P(L) is a bijection. To do so, we 
must prove that the equations f = 0 anda f = 0 are equivalent, defining one and the 


326 9  Projective Spaces 


same hyperplane, that is, P¢ = Py ¢. As was shown in Sect. 3.7, every hyperplane 
L’ C Lis determined by a single nonzero linear equation. Two different equations 
jf =0and f, =0 can define one and the same hyperplane only if f; =a /f, where 
a@ is some nonzero number. Indeed, in the contrary case, the system of the two 
equations f = 0 and f, =0 has rank 2, and therefore, it defines a subspace L” of 
dimension n — 2 in L and a subspace P(L’) C P(L) of dimension n — 3, which is 
obviously not a hyperplane. Thus the dual space P(L*) can be interpreted as the 
space of hyperplanes in P(L). This is the simplest example of the fact that certain 
geometric objects cannot be described by numbers (such as, for example, vector 
spaces can be described by their dimension), but constitute a set having a geometric 
character. We shall encounter more complex examples in Chap. 10. 

There is also a much more general fact, namely that there is a bijection between 
m-dimensional projective subspaces of the space P(L) (dimension 7) and subspaces 
of dimension n — m — 1 of the space P(L*). We shall now describe this correspon- 
dence, and the reader will easily verify that for m =n — 1, this coincides with the 
above-described correspondence between hyperplanes in P(L) and points in P(L*). 

Let L’ C L be a subspace of dimension m + 1, so that dim P(L’) = m. Let us con- 
sider in the dual space L*, the annihilator (L’)“ of the subspace L’. Let us recall that 
the annihilator is the subspace (L’)* C L* consisting of all linear functions f € L* 
such that f(x) =0 for all vectors x € L’. As we established in Sect. 3.7 (formula 
(3.54)), the dimension of the annihilator is equal to 


dim(L’)“ = dimL — dimL’ =n —m. (9.10) 


The projective subspace P((L’)*) C P(L*) is called the dual to the subspace 
P(L’) c P(L). By (9.10), its dimension is n — m — 1. What we have here is a vari- 
ant of a concept that is well known to us. If a nonsingular symmetric bilinear form 
(x, y) is defined on the space L, then we can identify (L’)“ with the orthogonal com- 
plement to L’, which was denoted by (L’)+; see p. 198. If we write the bilinear form 
(x, y) in some orthonormal basis of the space L, then it takes the form )7}_ xi i, 
and the point with coordinates (yo, y1,.-.-, Yn) Will correspond to the hyperplane 
defined by the equation 


n 
So xii =0, 
i=0 


in which yo,..., Yn are taken as fixed, and xo, ..., X, are variables. 

The assertions we have proved together with the duality principle established in 
Sect. 3.7 leads automatically to the following result, called the principle of projective 
duality. 


Proposition 9.10 (Principle of projective duality) If a theorem is proved for all 
projective spaces of a given finite dimension n over a given field KK in a formulation 
that uses only the concepts of projective subspace, dimension, projective cover, and 
intersection, then for all such spaces, one has also the dual theorem obtained from 


9.1 Definition of a Projective Space 327 


the original one by the following substitutions: 


dimension m dimension n —m — | 
intersection P; OP projective cover P; UP2 
projective cover P; UP2 intersection P| NP. 


For example, the assertion “through two distinct points of the projective plane 
there passes one line” has as its dual assertion “every pair of distinct lines in the 
projective plane intersect in one point.” 

One may try to extend this principle in such a way that it will cover not only 
projective spaces, but also the projective algebraic varieties described by equation 
(9.5). However, in this regard there appear some new difficulties, which we shall 
only mention here without going into detail. 

Assume, for example, that a projective algebraic variety X C P(L) is given by 
the single equation 


F (x0, X1,.--,Xn) =9, 


where F' is a homogeneous polynomial. To every point A € X there corresponds a 
hyperplane given by the equation 


n 


oF 
5 As =0, (9.11) 


i=0 } 


called the tangent hyperplane to X at the point A (this notion will be discussed later 
in greater detail). By the above considerations, we can assign to this hyperplane the 
point B of the dual space P(L*). 

It is natural to suppose that as A runs through all points X, then the point B also 
runs through some projective algebraic variety in the space P(L), called the dual 
to the original variety X. This is indeed the case, except for certain unpleasant ex- 
ceptions. Namely, for some point A, it could be the case that all partial derivatives 
SE (A) are equal to 0 fori = 0,1,...,”, and equation (9.11) takes the form of the 
identity 0 = 0. Such points are called singular points of the projective algebraic va- 
riety X. In this case, we do not obtain any hyperplane, and therefore, we cannot use 
the indicated method to assign to the point A a given point of the space P(L*). It 
is possible to prove that singular points are in some sense exceptional. Moreover, 
many very interesting varieties have no singular points at all, so that for them, the 
dual variety exists. But then in the dual variety, there appear singular points, so that 
the beautiful symmetry nevertheless disappears. Overcoming all these difficulties 
is the task of algebraic geometry. We shall not go deeply into this, and we have 
mentioned it only in connection to the fact that in Chap. 11, devoted to quadrics, 
we shall consider precisely the special case in which these difficulties do not ap- 
pear. 


328 9  Projective Spaces 


9.2 Projective Transformations 


Let A be a linear transformation of a vector space L into itself. It is natural to en- 
tertain the idea of extending it to the projective space P(L). It would seem to be 
something easy to do: one has only to associate with each point P € P(L) corre- 
sponding to the line (e) in L, the line (.A(e)), which is some point of the projective 
space P(L). However, here we encounter the following difficulty: If A(e) = 0, then 
we cannot construct the line (A(e)), since all vectors proportional to A(e) are the 
null vector. Thus the transformation that we wish to construct is not defined in gen- 
eral for all points of the projective space P(L). However, if we wished to define it for 
all points, then we must require that the kernel of the transformation A be (0). As 
we know, this condition is equivalent to the transformation A :L— L being nonsin- 
gular. Thus to all nonsingular transformations A of the space L into itself (and only 
these) there correspond mappings of the projective space P(L) into itself. We shall 
denote them by P(A). 

We have seen that a nonsingular transformation A :L— L defines a bijective 
mapping of the space L into itself. Let us prove that in this case, the corresponding 
mapping P(A) : P(L) > P(L) is also a bijection. First, let us verify that its image 
coincides with all P(L). Let P be a point of the space P(L). It corresponds to some 
line (e) in L. Since the transformation A is nonsingular, it follows that e = A(e’) 
for some vector e’ € L, and moreover, e’ 4 0, since e £0. If P’ is a point of the 
space P(L) corresponding to the line (e’), then P’ = P(.A)(P). It remains to show 
that P(.4) cannot map two distinct points into one. Let us suppose that P 4 P’ and 


P(A)(P) = P(A)(P’) = P, (9.12) 


where the points P, P’, and P correspond to the lines (e), (e’), and (@) respectively. 

The condition P 4 P’ is equivalent to the vectors e and e’ being linearly in- 
dependent, while from equality (9.12) it follows that (A(e)) = (A(e’)) = (e), 
which means that the vectors A(e) and A(e’) are linearly dependent. But if 
aA(e) + BA(e’) = 0, where a 40 or B £0, then A(awe + Be’) = 0, and since 
the transformation -A is nonsingular, we have we + Be’ 4 0, which contradicts the 
condition P 4 P’. Thus we have proved that the mapping P(.A) : P(L) > P(L) isa 
bijection. Consequently, the inverse mapping P(A)! is also defined. 


Definition 9.11 A mapping P(A) of the projective space P(L) corresponding to the 
nonsingular transformation A of a vector space L into itself is called a projective 
transformation of the space P(L). 


Theorem 9.12 We have the following assertions: 


(1) P(A) = P(A2) if and only if Az = X.A1, where d is some nonzero scalar. 

(2) If A, and Az are two nonsingular transformations of a vector space L, then 
P(A; A2) = P(A1)P( Ag). 

(3) If A is a nonsingular transformation, then P(A)! = P(AW)). 


9.2 Projective Transformations 329 


(4) A projective transformation P(A) carries every projective subspace of the space 
P(L) into a subspace of the same dimension. 


Proof All the assertions of the proof follow directly from the definitions. 

(1) If Az =A}, then it is obvious that A; and “2 map lines of the vector space 
L in exactly the same way, that is, P(.A1) = P(A2). Now suppose, conversely, that 
P(A1)(A) = P(A2)(A) for an arbitrary point A € P(L). If the point A corresponds 
to the line (e), then we have (A; (e)) = (A2(e)), that is, 


Az(e) = AA (e), (9.13) 


where A is some scalar. However, in theory, the number A in relationship (9.13) could 
have had its own value for each vector e. Let us consider two linearly independent 
vectors x and y and for the vectors x, y, and x + y, let us write down condition 
(9.13): 


A2(x) = AAI (x), 
Ag(y) = wAL(Y), (9.14) 
A(x + y)=vAi(x+ y). 


In view of the linearity of A; and A», we have 
A(x + y) = Ai(x) + Ar(y), Ag(x + y) = Ar(x) + A2(y). (9.15) 


Having substituted expressions (9.15) into the third equality of (9.14), we then sub- 
tract from it the first and second inequalities. We then obtain 


(v —A) Ai (x) + (Vv — pw) Al(y) = Ai ((v —A)x + (vV — wy) = 0. 


Since the transformation A, is nonsingular (by the definition of a projective trans- 
formation), it follows that (v — A)x + (v — 1) y = 0, and in view of the linear inde- 
pendence of the vectors x and y, it follows from this that A = v and jz = v, that is, all 
the scalars A, jz, v in (9.14) are the same, and therefore the scalar 4 in relationship 
(9.13) is one and the same for all vectors e € L. 

(2) We must prove that for every point P of the corresponding line (e), we have 
the equality P(.A;.42)(P) = P(A1)(P(A2)(P)), and this, by the definition of a pro- 
jective transformation, follows from the fact that ((A1.2)(e)) = A; ((A2(e))). The 
last equality follows from the definition of the product of linear transformations. 

(3) By what we have proven, we have the equality P(.A)P(A~!) = P(AA7!) = 
P(€). It is obvious that P(&) is the identity transformation of the space P(L) into 
itself. From this, it follows that P(.A)~! = P(.A~!). 

(4) Finally, let L’ be an m-dimensional subspace of the vector space L and let 
P(L’) be the associated (m — 1)-dimensional projective subspace. The mapping 
P(A) takes P(L’) into a collection of points of the form P” = (A(e’)), where 
P’ = ((e’)) runs through all points of P(L’). This holds because e’ runs through 
all vectors of the space L’. Let us prove that here, all vectors (.A(e’)) coincide with 


330 9  Projective Spaces 


the nonnull vectors of some vector subspace L” having the same dimension as L’. 
This will give us the required assertion. 

In the subspace L’, let us choose a basis €1, ..., @m. Then every vector e’ € L’ can 
be represented in the form 


’ 
e=aje;+---+Amnem, 


while the condition e’ ¥ 0 is equivalent to not all the coefficients a; being equal to 
zero. From this, we obtain 


Ale’) =a A(e1) +--+ + on A(Em)- (9.16) 


The vectors A(e1),..., A(@m) are linearly independent, since the transformation 
“A :L— L is nonsingular. Let us consider the m-dimensional subspace L” = 
(A(e€1),..., A(@,)). From the relationship (9.16), it follows that the transformation 
P(A) takes the points of the subspace P(L’) precisely into the points of the subspace 
P(L’’). From the equality dimL’ = dimL” = m, we obtain dim P(L’) = dim P(L”) = 
m—1. 


By analogy with linear and affine transformations, there is a hope that we can de- 
scribe a projective transformation unambiguously by how it maps a certain number 
of “sufficiently independent” points. As a first attempt, we may consider the points 
P; = (e;) fori =0,1,...,, where eo, e1,..., @, is a basis of the space L. But this 
path does not lead to our goal, for there exist too many distinct transformations tak- 
ing each point P; into itself. Indeed, such are all the transformations of the form 
P(A) if A(e;) = Aje; with arbitrary 4; ~ 0, that is, in other words, if “A has, in the 
basis e9, €1,..., €n, the matrix 


do O «+ O 
O At es 0 
A=]... ; 
0 O «ss: Ap 
In this case, (A(e;)) = (e;) for all i = 0,1,...,. However, the image of an arbi- 


trary vector 
e=agéeog taye, +--+ any 


is equal to 
A(e) = apAgA(eo) + ajArA(e1) +--+ + AnAnA(En), 


and this vector is already not proportional to e unless all A; are identical. Thus even 
knowing how the transformation P(A) maps the points Po, Pi,..., Pn, we are not 
yet able to determine it uniquely. But it turns out that the addition of one more point 
(under some weak assumptions) describes the transformation uniquely. For this, we 
need to introduce a new concept. 


9.2 Projective Transformations 331 


Definition 9.13 In the n-dimensional projective space P(L), n + 2 points 
Po, Pi,---5 Phy Poti (9.17) 


are said to be independent if no n + | of them lie in a subspace of dimension less 
than n. 


For example, four points in the projective plane are independent if no three of 
them are collinear. 

Let us explore what the condition of independence means if to the point P; 
there corresponds the line (e;), i = 0,...,2 + 1. Since by definition, the points 
Po, Pi, ..-, Pn do not lie in a subspace of dimension less than n, it follows that the 
vectors €9, €1,..-,@n do not lie in a subspace of dimension less than n + 1, that 
is, they are linearly independent, and this means that they constitute a basis of the 
space L. Thus the vector e,,+1 is a linear combination of these vectors: 


Cn+1 = A€9 +e] +-+- + Aen. (9.18) 


If some scalar a; is equal to 0, then from (9.18), it follows that the vector en+ 
lies in the subspace L’ = (e€,..., @;,...,@,), where the sign ~ indicates the omis- 
sion of the corresponding vector. Consequently, the vectors eg, ..., @;,.--,€n, ent 
lie in a subspace L’ whose dimension does not exceed n. But this means that the 
points Po,..., P ..-; Py, Pn4i lie in the projective space P(L’), and moreover, 
dim P(L’) < n — 1, that is, they are dependent. 

Let us show that for the independence of points (9.17), it suffices that in the 
decomposition (9.18), all coefficients a; be nonzero. Let the vectors eg, e1,..., €n 
form a basis of the space L, while the vector e,+1 is a linear combination (9.18) 
of them such that all the a; are nonzero. Let us show that then, the points (9.17) 
are independent. If this were not the case, then some n + | vectors from among 


€0,€1,---,@n+1 Of the space L would lie in a subspace of dimension not greater 
than n. This cannot be the vectors e9, e1,..., @n, since by assumption, they consti- 
tute a basis of L. So let it be the vectors e9,..., @;,.--,€n, @n+1 for somei <n+1, 


and their linear dependence is expressed by the equality 
Ageo +++ + Aj—1ei—1 + AG 1Ei41 +++ + Ang 1eng1 = 9, 


where Ay+1 ~ 0, since the vectors e9,e@1,...,@, are linearly independent. From 
this, it follows that the vector e,4; is a linear combination of the vectors 
€0,.--,@;,-+-,@n- But this contradicts the condition that in the expression (9.18), 
all the a; are nonzero, since the vectors eg, €1,...,@, form a basis of the space L, 
and the decomposition (9.18) for an arbitrary vector e,,4 ; uniquely determines its 
coordinates «;. 

Thus, n + 2 independent points (9.17) are always obtained from n + 1 points 
P; = (e;) whose corresponding vectors e; form a basis of the space L by the addition 
of one more point P = (e) for which the vector e is a linear combination of the 
vectors e; with all nonzero coefficients. 

We can now formulate our main result. 


332 9  Projective Spaces 
Theorem 9.14 Let 

Po, Pi, .-+s Pas Pass ica rere oe ge (9.19) 
be two systems of independent points of the projective space P(L) of dimension n. 


Then there exists a projective transformation taking the point P; to P! for alli = 
0,1,...,2+ 1, and moreover, it is unique. 


Proof We shall use the interpretation of the property of independence of points 
obtained above. Let points P; correspond to the lines (e;), and let the points P’ cor- 


respond to the lines (e’). We may assume that the vectors eo, ..., €, and the vectors 
£0: eas e;, are bases of an (7 + 1)-dimensional subspace of L. Then as we know, for 
every collection of nonzero scalars Xo, ..., A, there exists (and it is unique) a non- 
singular linear transformation A :L— L mapping e; to hie’ for alli =0,1,...,n. 
By definition, for such a transformation “A, we have P(.A)(P;) = r for all i = 
0,1,...,2. Since dimL =n + 1, we have the relationships 
Cnt1 =ageg tae; +--+» + Onn, C41 = Aly + aye, +--+ ae). (9.20) 


From the condition of independence of both collections of points (9.19), it follows 
that in the representations (9.20), all the coefficients a; and a! are nonzero. Applying 
the transformation A to both sides of the first relationship in (9.20), taking into 
account the equalities A(e;) = hie’ , we obtain 


A(En+1) = aAeg + aA e} +++ +Onhne)- (9.21) 


After setting the scalars 4; equal to aja ' for all i =0,1,...,n and substituting 
them into the relationship (9.21), taking into account the second equality of formula 
(9.20), we obtain that A(e,+1) = ip that is, P(A)(P,41) = Pte 

The uniqueness of the projective transformation P(A) that we have obtained fol- 


lows from its construction. 


For example, for n = 1, the space P(L) is the projective line. Three points 
Po, P,, P2 are independent if and only if they are distinct. We see that any three 
distinct points on the projective line can be mapped into three other distinct points 
by a unique projective transformation. 

Let us now consider how a projective transformation can be given in coordinate 
form. In homogeneous coordinates (9.2), the stipulation of a projective transforma- 
tion P(A) in fact coincides with that of a nonsingular linear transformation A, and 
indeed, the homogeneous coordinates of a point A € P(L) coincide with the coor- 
dinates of the vector x from (9.1) that determines the line (x) corresponding to the 
point A. Using formula (3.25), we obtain for the homogeneous coordinates 6; of 
the point P(.4)(A) the following expressions in homogeneous coordinates a; of the 


9.2 Projective Transformations 333 


point A: 


Bo = agoao + 0101 + ag2@2 + +++ + donQn, 


Bi = 4100 + 41104 + 4202 + +++ + Ann, (9.22) 


Bn = Anno + Ani] + Gn202 + +++ + ApnOn. 


Here we must recall that the homogeneous coordinates are defined only up to a 
common factor, and both collections (ag : a :---:@,) and (6p: 6, :---: By) are 
not identically zero. Clearly, in multiplying all the a; by the common factor A, all 8; 
in formula (9.22) are also multiplied by this factor. All the 6; cannot become zero if 
all the a; cannot become zero (this follows from the fact that the transformation A is 
nonsingular). The condition of nonsingularity of the transformation A is expressed 
as the determinant of its matrix being nonzero: 


a400-=— 401 “s+ GOn 
aio ail ++: Gin 

£0. 
Gn0 nl *** Ann 


Another way of writing a projective transformation is in inhomogeneous coor- 
dinates of affine spaces. Let us recall that a projective space P(L) contains affine 
subsets V;,i =0,1,...,, and it can be obtained from any of the V; by the addition 
of the corresponding projective hyperplane P(L;) consisting of “points at infinity,” 
that is, in the form P(L) = V; U P(L;). For simplicity of notation, we shall limit 
ourselves to the case i = 0; all the remaining V; are considered analogously. 

To an affine subset Vo there corresponds (as its subspace of vectors) the vector 
subspace Lo C L defined by the condition ao = 0. For assigning coordinates in the 
affine space Vo, we must fix in the space some frame of reference consisting of a 
point O € Vo and a basis in the space Lo. In the (7 + 1)-dimensional space L, let us 
choose a basis €9, €1,..., @n- For the point O € Vo, let us choose the point associated 
with the line (e9), and for the basis in Lo, let us take the vectors e1,..., €n. 

Let us consider a point A € Vo, which in the basis eo, €1,..., @, of the space L 
has homogeneous coordinates (a : a] :---: @,), and repeating the arguments that 
we used in deriving formulas (9.6), let us find its coordinates with respect to the 
frame of reference (O; e1,...,@,) constructed in the manner outlined above. The 
point A corresponds to the line (e), where 


e€=ageo +ajey +--+ + anen, (9.23) 


and moreover, ag 4 0, since A € Vo. By assumption, we must choose from both 
lines (@9) and (e), vectors x and y with coordinate aj = | and examine the coor- 
dinates of the vector y — x with respect to the basis e1,..., ey. It is obvious that 
xX = eo, and in view of (9.23), we have 


-1 -1 
YHOO FAIA) C1 +--+ + Any en. 


334 9  Projective Spaces 


Thus the vector y — x has, in the basis e;,..., e,, coordinates 
Qa] An 
x= —; ie x= — 
ao ao 


We shall now consider a nonsingular linear transformation A:L— L and the 
associated projective transformation P(.A), given by formulas (9.22). It takes a point 
A with homogeneous coordinates a; to a point B with homogeneous coordinates 6;. 
In order to obtain in both cases inhomogeneous coordinates in the subset Vo, it is 
necessary, by formula (9.6), to divide all the coordinates by the coordinate with 
index 0. Thus we obtain that a point with inhomogeneous coordinates x; = on is 


mapped to the point with inhomogeneous coordinates y; = EL, that is, taking into 
account (9.22), we obtain the expressions 


= Gig + Gi1X1 1+ F inXn 
1 — ’ 
doo + 401 X1 + +++ + a0nXn 


7 oe (9.24) 


In other words, in inhomogeneous coordinates, a projective transformation can be 
written in terms of the linear fractional formulas (9.24) with a common denominator 
for all y;. It is not defined at points where this denominator becomes zero, and these 
are the “points at infinity,” that is, points of the projective hyperplane P(Lg) with 
equation By = 0. 

Let us consider projective transformations mapping “points at infinity” to “points 
at infinity” and consequently, “finite points” to “finite points.” This means that the 
equality Bg = 0 is possible only for ap = 0, that is, taking into account formula 
(9.22), the equality 


agoag + agi ay + ag2a@2 +--+ + dona, = 0 


is possible only for a9 = 0. Obviously, this latter condition is equivalent to the con- 
ditions ao; = O for all i = 1,...,n. In this case, the common denominator of the 
linear fractional formulas (9.24) reduces to the constant ago. From the nonsingular- 
ity of the transformation A, it follows that aop9 4 0, and we can divide the numer- 
ators in equalities (9.24) by ao9. We then obtain precisely the formulas for affine 
transformations (8.17). Thus affine transformations are special cases of projective 
transformations, namely, those that take the set of “points at infinity” to itself. 


Example 9.15 In the case dim P(L) = 1, the projective line P(L) has a single inho- 
mogeneous coordinate, and formula (9.24) assumes the form 


_ atbx 
~ e+dx’ 


y ad — bc £0. 


Transformations of the “finite part’ of the projective line (x 4 oo) are affine and 
have the form y = a + Bx, where B £0. 


9.3 The Cross Ratio 335 


9.3 The Cross Ratio 


Let us recall that in Sect. 8.2, we defined the affine ratio (A, B, C) among three 
collinear points of an affine space, and then, in Sect. 8.3, it was proved (The- 
orem 8.28) that the affine ratio (A, B,C) among three collinear points does not 
change under a nonsingular affine transformation. In projective spaces, the notion 
of a relationship among three collinear points cannot be given a natural analogue. 
This is the result of the following assertion. 


Theorem 9.16 Let A;, By, C, and Az, Bo, C2 be two triples of points in a projective 
space satisfying the following conditions: 


(a) The three points in each triple are distinct. 
(b) The points in each triple are collinear (one line for each triple). 


Then there exists a projective transformation taking one triple into the other. 


Proof Let us denote the line on which the three points A;, B;, C; lie by /;, where 
i= 1,2. Points A;, By, C; are independent on /;, and the points Az, Bo, C2 are in- 
dependent on /9. Let the point A; be determined by the line (e;), point B; by the 
line (f;), point C; by the line (g;), and line J; by the two-dimensional space L;, 
i = 1, 2. They are all contained in the space L that determines our projective space. 
Repeating the proof of Theorem 9.14 verbatim, we shall construct an isomorphism 
A’ :L, — Lp taking the lines (e1), (f), (g1) to the lines (e2), (f2), (g2) respec- 
tively. Let us represent the space L in the form of two decompositions: 


L=L,eU,, L=hely. 


It is obvious that dimL = dimL, = dimL — 2, and therefore, the spaces L; and 
L’, are isomorphic. We shall choose some isomorphism 4” : Li —> L and define a 
transformation A :L— Las A’ on L; and as A” on L, while for arbitrary vectors 
x €L, we shall use the decomposition x = x; + Xi: x, EL), x! € Lis to define 
A(x) = A’(x1) + A” (x}). It is easy to see that A is a nonsingular linear transfor- 
mation, and the projective transformation P(A) takes the triple of points A1, By, Cy 
to A2, Bo, Co. 


Analogously to the fact that for a triple of collinear points A, B, C of an affine 
space, there is an associated number (A, B, C) that is unchanged under every non- 
singular affine transformation, in a projective space we can associate with a quadru- 
ple of collinear points A;, A2, A3, Aq a number that does not change under projec- 
tive transformations. This number is denoted by (A1, Az, A3, A4) and is called the 
cross or anharmonic ratio of these four points. We now turn to its definition. 

Let us consider first the projective line / = P(L), where dimL = 2. Four arbitrary 
points A,, Az, A3, A4 on/ correspond to four lines (a1), (a2), (a3), (a4) lying in the 
plane L. In the plane L, let us choose a basis e;, e2 and consider the decomposition 


336 9  Projective Spaces 


of the vectors a; in this basis: a; = xje; + yjéo2,i = 1,...,4. The coordinates of the 
vectors a,,..., a4 can be written as the columns of the matrix 


ma(*) %2 33 x4) 
yi y2 = Y3 4 
Consider the following question: how do the minors of order 2 of the matrix M 
change under a transition to another basis e', e, of the plane L? Let us denote by 


[a;] and [ov] the columns of the coordinates of the vector a; in the bases (e1, e2) 
and (e/, e',) respectively: 


Xj 1) x; 
jasl= (3). [a] = y/ 


By formula (3.36) for changing coordinates, they are related by [a] = C[a’], 
where C is the transition matrix from the basis es e, to the basis e;, e2. From this 


it follows that 
(; a ae x; xi 
‘ 7 / / 
Yi Sj Yi Yj 


for any choice of indices i and j, and by the theorem on multiplication of determi- 
nants, we obtain 

Xj Xj xi 
ry? 
j 


x! 
=(Cle|7, 
vi. Yi Yi 


y . 


where |C| ~ 0. This means that for any three indices i, j, k, the relation 


/ 
hea ee 
Yi Yj Yi Vj 
ager (9.25) 
i XE 
Yi Yk | ae 
Ji Vk 


is unaltered under a change of basis (we assume now that both determinants, in 
the numerator and denominator, are nonzero). Thus relationship (9.25) determines a 
number (@;, a;, a,) depending on the three vectors a;,a;, a, but not on the choice 
of basis in L. 

However, this is not yet what we promised: the points A; indeed determine the 
lines (a;), but not the vectors a;. We know that the vector a’ determines the same 
line as the vector a; if and only if a, = hja;, 4; 4 0. Therefore, if in expression 
(9.25) we replace the coordinates of the vectors a;,a;,a; with the coordinates of 
the proportional vectors a’,, a’, a‘., then its numerator will be multiplied by 4;,;, 
while its denominator will be multiplied by 4;A;, with the result that the entire 
expression (9.25) will be multiplied by the number ve eae which means that it will 
change. 


9.3. The Cross Ratio 337 


However, if we now consider the expression 


X1 x3 x2 X4 
Be y3t" es ya 

DV(A}, Ao, A3, Ad) = eT (9.26) 
cE ya oe 


then as our previous reasoning demonstrates, it will depend neither on the choice 
of basis of the plane L nor on the choice of vectors a; on the lines (a;), but will 
be determined only by the four points A;, A2, A3, A4 on the projective line /. It is 
expression (9.26) that is called the cross ratio of these four points. 

Let us write the expression for DV(A,, Az, A3, A4) assuming that homogeneous 
coordinates have been introduced on the projective line /. Let us begin with the 
formula written in the homogeneous coordinates (x : y). We shall now consider the 
points A; “finite” points of /, that is, we assume that y; 4 0 for alli =1,...,4, and 
we set t; = x;/y;; these will be the coordinates of the point A; in the “affine part” 
of the projective line /. Then we obtain 


Xj Xj) hi 2) ee ee 
yi Yj = Jty7 1 1) IG tj). 


Substituting these expressions into formula (9.26), we see that all the y; cancel, and 
as a result, we obtain the expression 


(t1 — 13) (t2 — ta) 


DV(Aq, Az, A3, Aq) = . 
(ti — t4)(t2 — t3) 


(9.27) 


If we assume that all four points A;, Az, A3, Aq lie in the “finite part” of the 
plane, then this means in particular that they belong to the affine part of the projec- 
tive line / and have finite coordinates t), fz, 3, 4 on the projective line /. Taking into 
account formula (8.8) for the affine ratio of three points, we observe that then the 
expression for the cross ratio takes the form 


(A3, A2, Al) 


DV(A1, Az, A3, Ag) = ———_.. 
(Aq, Az, At) 


(9.28) 


Equality (9.28) shows the connection between the cross ratio and the affine ratio 
introduced in Sect. 8.2. 

We have determined the cross ratio for four distinct points. In the case in which 
two of these points coincide, it is possible to define this ratio under some natural 
conventions (as we did for the affine ratio), setting the cross ratio in some cases 
equal to oo. However, the cross ratio remains undefined if three of the four points 
coincide. 

The above reasoning almost contains the proof of the following fundamental 
property of the cross ratio. 


Theorem 9.17 The cross ratio of four collinear points in a projective space does 
not change under a projective transformation of the space. 


338 9  Projective Spaces 


Fig. 9.3. Perspective 
mapping 


Proof Let A,, Az, A3, A4 be four points lying on the line /’ in some projective space 
P(L). They correspond to the four lines (a), (a2), (a3), (a4) of the space L, and the 
line /’ corresponds to the two-dimensional subspace L’ C L. Let A be a nonsingular 
transformation of the space L, and gy = P(.A) the corresponding projective trans- 
formation of the space P(L). Then by Theorem 9.12, g(/’) =/” is another line in 
the projective space P(L); it corresponds to the subspace A(L’) C L and contains 
the four points g(A1), @(A2), g(A3), g(Ag). Let the vectors e;, e2 form a basis of 
L’ and write the vectors a; as a; = xje; + yjé2, i =1,...,4. Then the cross ratio 
DV(Aj, Az, A3, Aq) is defined by the formula (9.26). 

On the other hand, A(a;) = x;-A(e1) + yj-A(e2), and if we use the bases f) = 
A(e,) and f, = A(e2) of the subspace A(L’), then the cross ratio 


DV(9(A1), 9(A2), 9(A3), (Aa) 


is defined by the same formula (9.26), since the coordinates of the vectors A(a;) in 
the basis f,, f' coincide with the coordinates of the vectors a; in the basis e1, e2. 
But as we have already verified, the cross ratio depends neither on the choice of 
basis nor on the choice of vectors a; that determine the lines (a;). Therefore, it 
follows that 


DV(Aj, Ao, A3, Ag) = DV(9(A1), Y(A2), Y(A3), (Aa). 


Example 9.18 Ina projective space IT, let us consider two lines /; and /2 and a point 
O lying on neither of the lines. Let us connect an arbitrary point A € /; to the point 
O of the line /4; see Fig. 9.3. We shall denote the point of intersection of the lines 
14 and 17 by A’. The mapping of the line /; into /, that to each point A € J; assigns 
the point A’ € Jy is called a perspective mapping. 


Let us prove that there exists a projective transformation of the plane /7 defining 
a perspective correspondence between the lines /; and /2. To this end, let us denote 
by Jo the line joining the point O and the point P = /; Ml», and let us consider 
the set V = JT \ Io. In other words, we shall consider /p a “line at infinity” and the 
points of V will be considered “finite points” of the projective plane. Then on V, the 
perspective correspondence will be given by a bundle of parallel lines, since these 
lines in the “finite part” do not intersect; see Fig. 9.4. 

More precisely, this bundle defines a mapping of the “finite parts” / and J, of 
the lines 7; and /2. From this it follows that in the affine plane V, the lines i and 
I’, are parallel, and the perspective correspondence between them is defined by an 


9.4 Topological Properties of Projective Spaces* 339 


Fig. 9.4 A bundle of parallel 
lines 


Ed 


arbitrary translation 7, by the vector a = AA’ , where A is an arbitrary point on the 
line /;, and A’ is the point on the line J, corresponding to it under the perspective 
correspondence. As we saw above, every nonsingular affine transformation of an 
affine plane V is a projective mapping for /7, and this is even more obviously the 
case for a translation. This means that a perspective correspondence is defined by 
some projective transformation of the plane 7. Therefore, from Theorem 9.17, we 
deduce the following result. 


Theorem 9.19 The cross ratio of four collinear points is preserved under a per- 
spective correspondence. 


9.4 Topological Properties of Projective Spaces* 


The previous discussion in this chapter was related to a projective space P(L), where 
L was a finite-dimensional vector space over an arbitrary field K. If our interest is 
in a particular field (for example, R or C), then all the assertions we have proved 
remain valid, since we used only general algebraic notions (which derive from the 
definition of a field), and nowhere did we use, for example, properties of inequality 
or absolute value. Now let us say a few words about properties related to the notion 
of convergence, or as they are called, topological properties, of projective spaces. It 
makes sense to talk about them if, for example, L is a real or complex vector space, 
that is, the field in question is K = R or C. 

Let us begin by formulating the notion of convergence of a sequence of vectors 
X1,X2,...,X%,... in a space L to a vector x of the same space. Let us choose in L 
an arbitrary basis eo, €1,..., @n and let us write the vectors x, and x in this basis: 


Xx = 0e9 + Ke) + +++ + Oknen, x = Poeo + Biei +--+ + Buen. 


We shall say that the sequence of vectors ¥;,%2,...,Xx,... converges to the vector 
x if the sequence of numbers 


Oj, U2;,...,Aki,--- (9.29) 


for fixed i converges to the number f; as k — oo for each index i = 0,1,...,” (in 
speaking about complex vector spaces, we assume that the reader is familiar with the 
notion of convergence of a sequence of complex numbers). The vector x is called, 
in this case, the limit of the sequence. From the formulas for changing coordinates 


340 9  Projective Spaces 


given in Sect. 3.4, it is easy to derive that the property of convergence does not 
depend on the basis in L. We shall write this convergence as x, — x as k — oo. 
Let us move now from vectors to points of a projective space. In both cases 
that we are considering (K = R or C), there is a useful method of normalizing the 
homogeneous coordinates (xo : x1 :---:X,) defined, generally speaking, only up to 
multiplication by a common factor A 4 0. Since by definition, the equality x; = 0 
for all i = 0, 1,...,2 is impossible, we may choose a coordinate x, for which |x;| 
(the absolute value in R or C, respectively) assumes the greatest value, and setting 
A. = |x,|, make the substitution y; = A71x; for alli =0,1,...,”. Then, obviously, 


(x0 2X1 i+++t Xn) = (Wor yi t+ 2 yn), 


and moreover, || = 1 and |y;| < 1 for alli =0,1,...,7. 


Definition 9.20 A sequence of points P}, Po,..., Pe,... converges to the point P 
if on every line (e;) that determines the point P;, and on the line (e) determining 
the point P, it is possible to find nonnull vectors x, and x such that x, > x as 
k — o. This is written as Py — P as k + o. The point P is called the limit of the 
sequence P;, P2,..., Pr,.... 


We note that by assumption, (ex) = (xx) and (e) = (x). 


Theorem 9.21 It is possible to choose from an arbitrary infinite sequence of points 
of a projective space a subsequence that converges to a point of the space. 


Proof As we have seen, every point P of a projective space can be represented in the 
form P = (y), where the vector y has coordinates (yo, y1,..., Yn), and moreover, 
max |y;| = 1. 

It is proved in a course in real analysis that every bounded sequence of real num- 
bers satisfies the assertion of Theorem 9.21. It is also very easy to prove the state- 
ment for a sequence of complex numbers. To obtain from this the assertion of the 
theorem, let us consider an infinite sequence of points Pj, P2,..., Px,... of the 
projective space P(L). Let us focus attention first on the sequence of zeroth (that 
is, having index 0) coordinates of the vectors x1, *2,...,X,,... corresponding to 
these points. Suppose they are the numbers 


Q10,020,---,Ak0,---- (9.30) 


As we noted above, we may assume that all |a;o| are less than or equal to 1. By the 
assertion from real analysis formulated above, from the sequence (9.30), we may 
choose a subsequence 


On 105 705 ++ +5 AngOr ++ (9.31) 


converging to some number fo that therefore also does not exceed | in absolute 
value. Let us now consider a subsequence of points Py,, Pn.,.--, Pn,,... and of 
vectors Xn,,Xny,---,Xn,,--. With the same indices as those in the subsequence 


9.4 Topological Properties of Projective Spaces* 341 


(9.31). Let us focus attention on the first coordinate of these vectors. For them, 
clearly, it is also the case that |a,1| < 1. This means that from the sequence 


Any1,Ang1,-++>Angls--- 


we may choose a subsequence converging to some number f, and moreover, clearly 


[Bil <1. 
Repeating this argument n + | times, we obtain as a result, from the original 


sequence of vectors *1,X2,...,Xx,..., a subsequence Xm, ,Xmz,-+-,Xmz,+-- CON- 
verging to some vector X € L, which, like every vector of this space, can be decom- 
posed in terms of the basis eo, €1,..., @n, that is, 


x = Boeo t+ Biei +--+ Bren. 


This gives us the assertion of Theorem 9.21 if we ascertain that not all coordinates 
Bo, B1,.--, Bn of the vector ¥ are equal to zero. But this follows from the fact that 
by construction, for each vector x,,, of the subsequence Xj ,,Xm..--+)Xmps ++ +> 
a certain coordinate on,;, i =0,...,n, has absolute value equal to 1. Since there 
exists only a finite number of coordinates, and the number of vectors Xm, iS in- 
finite, there must be an index i such that among the coordinates a ,;, infinitely 
many will have absolute value 1. On the other hand, by construction, the sequence 
Omyismzis-++>Ompi,--- converges to the number f;, which therefore must have 
absolute value equal to 1. 


The property established in Theorem 9.21 is called compactness. It holds as well 
for every projective algebraic variety of a projective space (whether real or com- 
plex). We may formulate it as follows. 


Corollary 9.22 In the case of a real or complex space, the points of a projective 
algebraic variety form a compact Set. 


Proof Let the projective algebraic variety X be given by a system of equations (9.5), 
and let P|, P2,..., Pg, ... be a sequence of points in X. By Theorem 9.21, there ex- 
ists a subsequence of this sequence that converges to some point P of this space. It 
remains to prove that the point P belongs to the variety X. For this, it suffices to 
show that it can be represented in the form P = (uw), where the coordinates of the 
vector u satisfy equations (9.5). But this follows at once from the fact that polyno- 
mials are continuous functions. Let F (xo, x1, ..., Xn) be a polynomial (in this case, 
homogeneous; it is one of the polynomials F; appearing in the system of equations 
(9.5)). We shall write it in the form F = F(x), where x € L. Then from the conver- 
gence of the vectors x, — x as k — oo such that F(x;) = 0 for all k, it follows that 
F(x) =0. 


For subsets of a finite-dimensional vector or affine space (whether real or com- 
plex), the property of compactness is related to their boundedness—more precisely, 


342 9  Projective Spaces 


Fig. 9.5 The real projective 
line 


the property of boundedness follows from compactness. Thus while real and com- 
plex vector or affine spaces can be visualized as “extending unboundedly in all di- 
rections,” for projective spaces, such is not the case. But what does it mean to say 
“can be visualized”? In order to formulate this intuitive idea precisely, we shall in- 
troduce for the real and complex projective lines some simple geometric representa- 
tions to which they are homeomorphic (see the relevant definition on p. xviii). This 
will allow us to give a precise meaning to the words that a given set “can be visual- 
ized.” Let us observe that the property of compactness established in Theorem 9.21 
is unchanged under a transition from one set to another that is homeomorphic to it. 

Let us begin with the simplest situation: a one-dimensional real projective space, 
that is, the real projective line. It consists of pairs (xo : x1), where xo and x; are 
considered only up to a common factor 4 4 0. Those pairs for which x9 4 0 form 
an affine subset U, whose points are given by the single coordinate t = x1 /xo, so 
that we may identify the set U with R. Pairs for which xo = 0 do not enter the set 
U, but they correspond to only one point (0: 1) of the projective line, which we 
shall denote by (co). Thus the real projective line can be represented in the form 
RU (oo). 

The convergence of points P, —> Q as k —> oo is defined in this case as follows. 
If points P, 4 (co) correspond to the numbers ¢,, and the point Q 4 (oo) corre- 
sponds to the number ¢, then Py = (ax : By) and Q = (a: B), where Bx /ax = tk, 
a, #0, and B/a =t, a 40. The convergence Py — Q as k > ov in this case im- 
plies the convergence of the sequence of numbers t, —> t as k — ow. In the case 
that P, — (oo), the convergence (in the previous notation) means that a, — 0, 
By — 1 as k > oo, from which it follows that 1° '_. 0, or equivalently, |t,| — oo 
as k > oo. 

We can graphically represent the real projective line by drawing a circle tangent 
to the horizontal line / at the point O; see Fig. 9.5. Connecting the highest point O’ 
of this circle with an arbitrary point A of the circle, we obtain a line that intersects / 
at some point B. We thereby obtain a bijection between points A 4 O’ of the circle 
and all the points B of the line /. If we place the coordinate origin of the line / at the 
point O and associate with each point B €/ a number ¢ € R resulting from a choice 
of some unit measure on the line / (that is, an arbitrary point of the line / different 
from O is given the value 1), then we obtain a bijection between numbers t € R 
and points A 4 O’ of the circle. Then |t,| —> 00 if and only if for the corresponding 
points A, of the circle, we have the convergence Ax, — O’. Consequently, we obtain 
a bijection between points of the real projective line IR U (00) and all points of the 
circle that preserves the notion of convergence. Thus we have proved that the real 


9.4 Topological Properties of Projective Spaces* 343 


Fig. 9.6 Stereographic 
projection of the sphere onto 
the plane 


projective line is homeomorphic to the circle, which is usually denoted by S! (the 
one-dimensional sphere). 

An analogous argument can be applied to the complex projective line. It is repre- 
sented in the form C U (oo). On it, the convergence of a sequence of points Py > Q 
as k + oo in the case Q ¥ (00) corresponds to convergence of a sequence of com- 
plex numbers z; — z, where z € C, while the convergence of the sequence of points 
Px. — (oo) corresponds to the convergence |zx| — oo (here |z| denotes the modulus 
of the complex number z). 

For the graphical representation of the complex projective line, Riemann pro- 
posed the following method; see Fig. 9.6. The complex numbers are depicted in the 
usual way as points in a plane. Let us consider a sphere tangent to this plane at the 
origin O, which corresponds to the complex number z = 0. Through the highest 
point O’ of the sphere and any other point A of the sphere there passes a line in- 
tersecting the complex plane at a point B, which represents some number z € C. 
This yields a bijection between numbers z € C and all the points of the sphere, with 
the exception of the point O’; see Fig. 9.6. This correspondence is often called the 
stereographic projection of the sphere onto the plane. By associating the point (co) 
of the complex projective line with the point O’ of the sphere, we obtain a bijec- 
tion between the points of the complex projective line C U (oo) and all the points 
of the sphere. It is easy to see that convergence is preserved under this assignment. 
Thus the complex projective line is homeomorphic to the two-dimensional sphere 
in three-dimensional space, which is denoted by S?. 

In the sequel, we shall limit our consideration to projective spaces P(L), where L 
is areal vector space of some finite dimension, and we shall consider for such spaces 
the property of orientability. It is related to the concept of continuous deformation 
of a linear transformation, which was introduced in Sect. 4.4. 

By definition, every projective transformation of a projective space P(L) has the 
form P(A), where A is a nonsingular linear transformation of the vector space L. 
Moreover, as we have seen, the linear transformation A is determined by the pro- 
jective transformation up to a replacement by aA, where a is any nonzero number. 


Definition 9.23 A projective transformation is said to be continuously deformable 
into another if the first can be represented in the form P(.A;) and the second in the 
form P(2), and the linear transformation 4; is continuously deformable into Ao. 


Theorem 4.39 asserts that a linear transformation A; is continuously deformable 
into Az if and only if the determinants |.4;| and |A2| have the same sign. What 


344 9  Projective Spaces 


happens under a replacement of A by aA? Let the projective space P(L) have di- 
mension n. Then the vector space L has dimension n + 1, and |w.A| = at!) A]. If the 
number 7 + | is even, then it is always the case that at! 5 0, and sucha replace- 
ment does not change the sign of the determinant. In other words, in a projective 
space of odd dimension n, the sign of the determinant |.A| of a linear transforma- 
tion A is uniquely determined by the transformation P(A). This clearly yields the 
following result. 


Theorem 9.24 In a projective space of odd dimension, a projective transformation 
P(A1) is continuously deformable into P(A2) if and only if the determinants | A,| 
and |A2| have the same sign. 


The same considerations can be applied to projective spaces of even dimension, 
but they lead to a different result. 


Theorem 9.25 In a projective space of even dimension, every projective transfor- 
mation is continuously deformable into every other projective transformation. 


Proof Let us show that every projective transformation P(A) is continuously de- 
formable into the identity. If |.4| > 0, then this follows at once from Theorem 4.39. 
And if |.A| < 0, then the same theorem gives us that the transformation A is continu- 
ously deformable into B, which has matrix ( 7 z iF where E,, is the identity matrix 


of order n. But P(@) = P(—8), and the transformation — has matrix ( an ). 
Since in our case, the number 7 is even, it follows that | — E,| = (—1)” > 0, and 
by Theorem 4.38, the matrix ( : aa) is continuously deformable into E,+41, and 
consequently, the transformation — 8 is continuously deformable into the identity. 
Thus the projective transformation P(8) is continuously deformable into P(€), and 


this means by definition, that P(.A) is also continuously deformable into P(&). 


Expressing these facts in topological form, we may say that the set of projective 
transformations of the space P” of a given dimension has a single path-connected 
component if n is even, and two path-connected components if n is odd. 

Theorems 9.24 and 9.25 show that the properties of projective spaces of even and 
odd dimension are radically different. We encounter this for the first time in the case 
of the projective plane. It differs from the vector (or Euclidean) plane in that it has 
not two, but only one orientation. It is the same with projective spaces of arbitrary 
even dimension. We saw in Sect. 4.4 that the orientation of the affine plane can be 
interpreted as a choice of direction of motion around a circle. Theorem 9.25 shows 
that in the projective plane, this is already not the case—the continuous motion in 
a given direction around a circle in the projective plane can be transformed into 
motion in the opposite direction. This is possible only because our deformation at a 
certain moment “passes through infinity,’ which is impossible in the affine plane. 

This property can be presented graphically using the following construction, 
which is applicable to real projective spaces of arbitrary dimension. 


9.4 Topological Properties of Projective Spaces* 345 


Fig. 9.7 A model of the 
projective plane 


Fig. 9.8 Identification of 
points 


Let us assume that the vector space L defining our projective space P(L) is a Eu- 
clidean space, and let us consider in this space the sphere S, defined by the equality 
|x| = 1. Every line (x) of the space L intersects the sphere S. Indeed, such a line 
consists of vectors of the form ax, where a € R, and the condition ax € S means 
that |ax| = 1. Since |ax| = |a| - |x| and x 4 0, we may set |a| = |x|~!. With this 
choice, the number a is determined up to sign, or in other words, there exist two 
vectors, e and —e, belonging to the line (x) and to the sphere S. Thus associating 
with each vector e € S the line (x) of the projective space, we obtain the mapping 
f :S— P(). The previous reasoning shows that the image of f is the entire space 
P(L). However, this mapping f is not a bijection, since two points of the sphere S 
pass through one point P € P(L), corresponding to the line (x), namely, the vectors 
e and —e. This property is expressed by saying that the projective space is obtained 
from the sphere S' via the identification of its antipodal points. 

Let us apply this to the case of the projective plane, that is, we shall suppose 
that dim P(L) = 2. Then dimL = 3, and the sphere S contained in three-dimensional 
space is the sphere S*. Let us decompose it into two equal parts by a horizontal 
plane; see Fig. 9.7. 

Each point of the upper hemisphere is diametrically opposite some point on the 
lower hemisphere, and we can map the upper hemisphere onto the projective plane 
P(L) by representing each point P € P(L) in the form (e), where e is a vector of the 
upper hemisphere. 

However, this correspondence will not be a bijection, since antipodal points on 
the boundary of the hemisphere will be joined together, that is, they correspond to 
a single point; see Fig. 9.8. This is expressed by saying that the projective plane is 
obtained by identifying antipodal points of the boundary of the hemisphere. 

Let us now consider a moving circle with a given direction of rotation; see 
Fig. 9.9. In the figure is shown that when the moving circle intersects the bound- 
ary of the hemisphere, the direction of rotation changes to its opposite. 

This property is expressed by saying that the projective plane is a one-sided 
surface (while the sphere in three-dimensional space and other familiar surfaces 
are two-sided). This property of the projective plane was studied by Mébius. He 


346 9  Projective Spaces 


Fig. 9.9 Motion of a circle 


Fig. 9.10 Moébius strip 


Fig. 9.11 Partition of the 
sphere 


Fig. 9.12. The central part of 
A Cc 
the sphere 


presented an example of a one-sided surface that is now known as the Mébius 
strip. It can be constructed by cutting from a sheet of paper the rectangle ABDC 
(Fig. 9.10, left) and gluing together its opposite sides AB and CD, after rotating 
CD by 180°. The one-sided surface thus obtained is shown in the right-hand picture 
of Fig. 9.10, where is also shown the continuous deformation of the circle (stages 
1—> 2-— 3- 4), changing the direction of rotation to it opposite. 

The Mobius strip also has a direct relationship to the projective plane. Namely, 
let us visualize this plane as the sphere S”, in which antipodal points are identified. 
Let us divide the sphere into three parts by intersecting it with two parallel planes 
that pass above and below the equator. As a result, the sphere is partitioned into a 
central part U and two “caps” above and below; see Fig. 9.11. 

Let us begin by studying the central section U. For each point of U, its antipodal 
point is also contained in U. Let us divide U into two halves—front and back—by 
a vertical plane intersecting U in the arcs AB and CD; see Fig. 9.12. 

We may combine the front half (U’) with the rectangle ABDC in Fig. 9.10. 
Every point of the central section U either itself belongs to the front half or else has 
an antipodal point that belongs to the front half, of which there is only one, except 


9.4 Topological Properties of Projective Spaces* 347 


for the points of the segments AB and CD. In order to obtain only one of the two 
antipodal points of these segments, we must glue these segments together exactly as 
is done in Fig. 9.10. Thus the Mobius strip is homeomorphic to the part U’ of the 
projective plane. To obtain the remaining part V = P(L) \ U’, we have to consider 
the “caps” on the sphere; see Fig. 9.11. For every point in a cap, its antipodal point 
lies in the other cap. This means that by identifying antipodal points, it suffices to 
consider only one cap, for example the upper one. This cap is homeomorphic to a 
disk: to see this, it suffices simply to project it onto the horizontal plane. Clearly, 
the boundary of the upper cap is identified with the boundary of the central part 
of the sphere. Thus the projective plane is homeomorphic to the surface obtained 
by gluing a circle to the M6bius strip in such a way that its boundary is identified 
with the boundary of the Mobius strip (it is easily verified that the boundary of the 
Mobius strip is a circle). 


Chapter 10 
The Exterior Product and Exterior Algebras 


10.1 Pliicker Coordinates of a Subspace 


The fundamental idea of analytic geometry, which goes back to Fermat and 
Descartes, consists in the fact that every point of the two-dimensional plane or 
three-dimensional space is defined by its coordinates (two or three, respectively). 
Of course, there must also be present a particular choice of coordinate system. In 
this course, we have seen that this very principle is applicable to many spaces of 
more general types: vector spaces of arbitrary dimension, as well as Euclidean, 
affine, and projective spaces. In this chapter, we shall show that it can be applied 
to the study of vector subspaces M of fixed dimension m in a given vector space 
L of dimension n > m. Since there is a bijection between the m-dimensional sub- 
spaces M C L and (m — 1)-dimensional projective subspaces P(M) Cc P(L), we shall 
therefore also obtain a description of the projective subspaces of fixed dimension 
of a projective space with the aid of “coordinates” (certain collections of num- 
bers). 

The case of points of a projective space (subspaces of dimension 0) was already 
analyzed in the previous chapter: they are given by homogeneous coordinates. The 
same holds in the case of hyperplanes of a projective space P(L): they correspond 
to the points of the dual space P(L*). The simplest case in which the problem is 
not reduced to these two cases given above is the set of projective lines in three- 
dimensional projective space. Here a solution was proposed by Pliicker. And there- 
fore, in the most general case, the “coordinates” corresponding to the subspace 
are called Pliicker coordinates. Following the course of history, we shall begin in 
Sects. 10.1 and 10.2 by describing these using some coordinate system, and then 
investigate the construction we have introduced in an invariant way, in order to de- 
termine which of its elements depend on the choice of coordinate system and which 
do not. 

Therefore, we now assume that some basis has been chosen in the vector space L. 
Since dimL = n, every vector a € L has in this basis n coordinates. Let us consider 


a subspace M C L of dimension m <n. Let us choose an arbitrary basis aj, ..., Am 
of the subspace M. Then M = (aj, ..., @), and the vectors a,,...,@m are linearly 
LR. Shafarevich, A.O. Remizov, Linear Algebra and Geometry, 349 


DOI 10.1007/978-3-642-30994-6_10, © Springer-Verlag Berlin Heidelberg 2013 


350 10 The Exterior Product and Exterior Algebras 


independent. The vector a; has, in the chosen basis of the space L, coordinates 
dij,---,@in (i =1,...,m), which we can arrange in the form of a matrix M of type 
(m,n), writing them in row form: 


ait 4120 +++) Ain 
a21 422, +++ adn 
m=|. . . |. (10.1) 
Gm1 Gm2 °°: Amn 
The condition that the vectors a1,...,@m are linearly independent means that the 


rank of the matrix M is equal to m, that is, one of its minors of order m is nonzero. 
Since the number of rows of the matrix M is equal to m, a minor of order m is 


uniquely defined by the indices of its columns. Let us denote by Mj, ,._.,;,, the minor 
consisting of columns with indices i;,..., im, which assume the various values from 
lton. 

We know that not all of the minors Mj, _..;,, can be equal to zero at the same 
time. Let us examine how they depend on the choice of basis a1, ...,@» in M. If 
b,,..., bm is some other basis of this subspace, then 

b = ba, +---+Dimam, it=l,...,m. 
Since the vectors bj,...,bm are linearly independent, the determinant |(b;;)| is 


wands I 
analogously to M using the vectors b;,...,b,,, then by formula (3.35) and Theo- 
rem 2.54 on the determinant of a product of matrices, we have the relationship 


nonzero. Let us set c = |(b;;)|. If M;, im is a minor of the matrix M’, constructed 


M 


ij ,.-skm 


=cM; (10.2) 


Loeb? 
The numbers M;,_..;,, that we have determined are not independent. Namely, if 
the unordered collection of numbers j1,..., jm coincides with i1,...,im (that is, 
comprises the same numbers, perhaps arranged in a different order), then as we saw 
in Sect. 2.6, we have the relationship 


M j,,.....im = =Miy,...,im> (10.3) 
where the sign + or — appears depending on whether the number of transpositions 
necessary to effect the passage from the collection (i1,..., i) to (j1,---, jm) is 
even or odd. In other words, the function M;,,.._;,, of m arguments i1,...,im as- 
suming the values 1, ..., is antisymmetric. 

In particular, we may take as the collection (jj,..., jm) the arrangement of 
the numbers i;,..., im such that i) < iz <--- <i, and the corresponding minor 
M j,,.... jm Will coincide with either Mj,,..i,, or —Mi,,...,i,,- In view of this, in the 


Pitserim = Min,....im (10.4) 


10.1 Pliicker Coordinates of a Subspace 351 


for all collections 1; < iz <--- <i, of the numbers 1,...,”. Thus we assign to the 
subspace M as many of the numbers p;j,,...i,, aS there are combinations of n things 
taken m at a time, that is, v = C’”. From formula (10.3) and the condition that the 
rank of the matrix M is equal to m, it follows that these numbers p;,,...,;,, Cannot 
all become zero simultaneously. On the other hand, formula (10.2) shows that in 
replacing the basis a,,...,@m of the subspace M by some other basis b1,..., Bm 
of this subspace, all these numbers are simultaneously multiplied by some number 
c #0. Thus the numbers p;,,...,i,,, for ij < iz <--+ < im can be taken as the homoge- 
neous coordinates of a point of the projective space P’~! = P(N), where dimN = v 
and dim P(N) =v — 1. 


Definition 10.1 The totality of numbers p;,,...;,, in (10.4) for all collections i; < 
ig <--- <i, taking the values 1,...,” is called the Pliicker coordinates of the 
m-dimensional subspace M C L. 


As we have seen, Pliicker coordinates are defined only up to a common nonzero 
factor; the collection of them must be understood as a point in the projective space 
Pe, 

The simplest special case m = 1 returns us to the definition of projective space, 
whose points correspond to one-dimensional subspaces (a) of some vector space L. 
The numbers pj,,...i,, in this case become the homogeneous coordinates of a point. 
It is therefore not surprising that all of these depend on the choice of a coordinate 
system (that is, a basis) of the space L. Following tradition, in the sequel we shall 
allow for a certain imprecision and call “Pliicker coordinates” of the subspace M 
both a point of the projective space P’~! and the collection of numbers p; i 
specified in this definition. 


grees Im 


Theorem 10.2 The Pliicker coordinates of a subspace M C L uniquely determine 
the subspace. 


Proof Let us choose an arbitrary basis a1,...,@m of the subspace M. It uniquely 
determines (and not up to a common factor) the minors Mj, _...,;,,, without regard 
to the order of the indices i,,...,im. The minors are uniquely determined by the 
Pliicker coordinates (10.4), according to formula (10.3). 
A vector x € L belongs to the subspace M = (a1,..., @,) if and only if the rank 

of the matrix 

a1 412 *** Gin 

M= : 

Ami Gm2 *** Amn 

X1 X2 eee Xn 
consisting of the coordinates of the vectors a1, ..., @,, xX in some (arbitrary) basis 


of the space L, is equal to m, that is, if all the minors of order m + 1 of the matrix M 
are equal to zero. Let us consider the minor that comprises the columns with indices 
forming the subset X = {kj,...,km+1} of the set N, = {1,...,}, where we may 


352 10 The Exterior Product and Exterior Algebras 


assume that ky < ky <--- <k,»4,. Expanding it along the last row, we obtain the 
equality 


> Xy Aq = 0, (10.5) 


aex 


where Ag is the cofactor of the element x, in the minor under consideration. But by 
definition, the minor corresponding to Ag is obtained from the matrix M by deleting 
the last row and the column with index a. Therefore, it coincides with one of the 
minors of the matrix M, and the indices of its columns are obtained by deleting the 
element a from the set X. For writing the sets thus obtained, one frequently uses the 
convenient notation 


{k1,..., ka, e+ Km41}, 


where the notation ~ signifies the omission of the element so indicated. Thus rela- 
tionship (10.5) can be written in the form 


DICD x6; My ig, =, (10.6) 


grees Km+1 


Since the minors Mj,,...i,, Of the matrix M are expressed in Pliicker coordinates 
by formula (10.4), relationships (10.6), obtained from all possible subsets X = 
{k1,..-,km+1} of the set N,, also give expressions in terms of Pliicker coordinates 
of the condition x € M, which completes the proof of the theorem. 


By Theorem 10.2, Pliicker coordinates uniquely define the subspace M, but as a 
tule, they cannot assume arbitrary values. It is true that for m = 1, the homogeneous 
coordinates of a point of projective space can be chosen with arbitrary numbers 
(of course, with the exception of the one collection consisting of all zeros). Another 
equally simple case is m =n — |, in which subspaces are hyperplanes corresponding 
to points of P(L*). Hyperplanes are defined by their coordinates in this projective 
space, which also can be chosen as arbitrary collections of numbers (again with 
the exclusion of the collection consisting of all zeros). It is not difficult to verify 
that these homogeneous coordinates can differ from Pliicker coordinates only by 
their signs, that is, by the factor +1. However, as we shall now see, for an arbitrary 
number m <n, the Pliicker coordinates are connected to one another by certain 
specific relationships. 


Example 10.3 Let us consider the next case in order of complexity: n = 4, m = 2. 
If we pass to projective spaces corresponding to L and M, then this will give us a 
description of the totality of projective lines in three-dimensional projective space 
(the case considered by Pliicker). 

Since n = 4, m = 2, we have v = Ci = 6, and consequently, each plane MCL 
has six Pliicker coordinates: 


10.2 The Pliicker Relations and the Grassmannian 353 


P12, P13 P14, P23, P24; P34- (10.7) 


It is easy to see that for an arbitrary basis of the space L, we may always choose 
a basis a, b in the subspace M in such a way that the matrix M given by formula 


(10.1) will have the form 
_{1 0a £B 
i (; 1 y ) ; 


From this follow easily the values of the Pliicker coordinates (10.7): 


pi2 = 1, PB=yY, pis =5, p23 = —a, p24 = —B, 
p34 = ad — By, 


which yields the relationship p34 — p13p24 + p14p23 = 0. In order to make this 
homogeneous, we will use the fact that pj2 = 1, and write it in the form 


P12 P34 — P13p24 + pi4p23 = 0. (10.8) 


The relationship (10.8) is already homogeneous, and therefore, it is preserved under 
multiplication of all the Pliicker coordinates (10.7) by an arbitrary nonzero factor c. 
Thus relationship (10.8) remains valid for an arbitrary choice of Pliicker coordinates, 
and this means that it defines a point in some projective algebraic variety in 5- 
dimensional projective space.! In the following section, we shall study an analogous 
question in the general case, for arbitrary dimension m <n. 


10.2 The Pliicker Relations and the Grassmannian 


We shall now describe the relationships satisfied by Pliicker coordinates of an m- 
dimensional subspace M of an n-dimensional space L for arbitrary n and m. Here 
we shall use the following notation and conventions. Although in the definition 
of Pliicker coordinates pj,,.,i,, it was assumed that i <i2 <---<im, now we 
shall consider numbers pj,,..,;,, also with other collections of indices. Namely, if 
(j1,--+; Jm) 18 an arbitrary collection of m indices taking the values 1,...,”, then 
we set 


P jy yeosim =0 (10.9) 


if some two of the numbers j),..., jm are equal, while if all the numbers j),..., jm 
are distinct and (i1,..., i) is their arrangement in ascending order, then we set 


Pit jeoim = EP it, .sim? (10.10) 


'This variety is called a quadric. 


354 10 The Exterior Product and Exterior Algebras 


where the sign + or — depends on whether the permutation that takes (j1,..., jm) 
to (i1,...,%m) is even or odd (that is, whether the number of transpositions is even 
or odd), according to Theorem 2.25. 

In other words, in view of equality (10.3), let us set 


Pj pees Jm = Mj... im» (10.11) 
where (j1,-..., jm) is an arbitrary collection of indices assuming the values 1,..., 7. 


Theorem 10.4 For every m-dimensional subspace M of an n-dimensional space L 
and for any two sets (j,,.--, jm—1) and (ky, ..., km-+1) of indices taking the values 
1,...,2, the following relationships hold: 


m+1 
DOD Pjtscsinetite * Ph yaksoakinar =O (10.12) 


Sate 


r=] 
These are called the Pliicker relations. 


The notation k,,...,k;,...,km4, means that we omit k, in the sequence 
ky,...,kr,..., Km4i- 

Let us note that the indices among the numbers Pg,,....¢,, entering relationship 
(10.12) are not necessarily in ascending order, so they are not Pliicker coordinates. 
But with the aid of relationships (10.9) and (10.10), we can easily express them in 
terms of Pliicker coordinates. Therefore, relationship (10.12) may also be viewed as 
a relationship among Pliicker coordinates. 


Proof of Theorem 10.4 Returning to the definition of Pliicker coordinates in terms of 
the minors of the matrix (10.1) and using relationship (10.11), we see that equality 
(10.12) can be rewritten in the form 


m+1 
> Gly Mj, peeeng jim—1Kr * M,, Sey Kr yeekm41 =0. (10.13) 
r=1 


Let us show that relationship (10.13) holds for the minors of an arbitrary matrix of 
type (m,n). To this end, let us expand the determinant Mj, ....,;,,_,x, along the last 
column. Let us denote the cofactor of the element ajz, of the last column of this 
determinant by A;, / = 1,...,m. Thus the cofactor A; corresponds to the minor 
located in the rows and columns with indices (1,..., l, ...,m) and (j1,.--, jm—1) 
respectively. Then 


m 
M yest ky = > ak, Al. 
l=1 


10.2 The Pliicker Relations and the Grassmannian 355 


On substituting this expression into the left-hand side of relationship (10.13), we 
arrive at the equality 


m+1 
c . . * ¥ 
YE) M)j,,.....im—1 kr M,, ies Kp yeeekm41 


r=! 


m+1 m 
=e (Santa dicate 
r=] 


I=1 
Changing the order of summation, we obtain 


m+1 
r . a - 
ae) M j,...,im—1skr M,, pete Kersey Km+1 


r=1 


m m+1 


7=1 \r=1 


But the sum in parentheses is equal to the result of the expansion along the first row 
of the determinant of the square matrix of order m + 1 consisting of the columns 
of the matrix (10.1) numbered ky, ..., kj 41 and rows numbered /, 1,...,m. This 
determinant is equal to 


alk, lk ine kms 
1k, Alka ee kms 
Q2k, 2k» inal WAkn+1 = 0. 
ial Qmky ane Amkm+1 
Indeed, for arbitrary / = 1, ..., m, two of its rows (numbered | and / + 1) coincide, 


and this means that the determinant is equal to zero. 


Example 10.5 Let us return once more to the case n = 4, m = 2 considered in 
the previous section. Relationships (10.12) are here determined by subsets (k) and 
(1,m,n) of the set {1, 2,3, 4}. If, for example, k = 1 and! =2,m =3,n =4, then 
we obtain relationship (10.8) introduced earlier. It is easily verified that if all the 
numbers k,/,m,n are distinct, then we obtain the same relationship (10.8), while 
if among them there are two that are equal, then relationship (10.12) is an identity 
(for the proof of this, we can use the antisymmetry of p;; with respect to i and 
J). Therefore, in the general case, too (for arbitrary m and n), relationships (10.12) 
among the Pliicker coordinates are called the Pliicker relations. 


We have seen that to each subspace M of given dimension m of the space L of 
dimension n, there correspond its Pliicker coordinates 


Pijvenims EL <2 <+++ <im, (10.14) 


356 10 The Exterior Product and Exterior Algebras 


satisfying the relationships (10.12). Thus an m-dimensional subspace M C L is de- 
termined by its Pliicker coordinates (10.14), completely analogously to how points 
of a projective space are determined by their homogeneous coordinates (this is in 
fact a special case of Pliicker coordinates for m = 1). However, for m > 1, the co- 
ordinates of the subspace M cannot be assigned arbitrarily: it is necessary that they 
satisfy relationships (10.12). Below, we shall prove that these relationships are also 
sufficient for the collection of numbers (10.14) to be Pliicker coordinates of some 
m-dimensional subspace M C L. For this, we shall find the following geometric in- 
terpretation of Pliicker coordinates useful. 

Relationships (10.12) are homogeneous (of degree 2) with respect to the num- 
bers pj,,...,i,- After substitution on the basis of formulas (10.9) and (10.10), each of 
these relationships remains homogeneous, and thus they define a certain projective 
algebraic variety in the projective space P’—!, called a Grassmann variety or simply 
Grassmannian and denoted by G(m, n). 

We shall now investigate the Grassmannian G(m, n) in greater detail. 

As we have seen, G(m,n) is contained in the projective space P’~!, where 
v =C” (see p. 351), and the homogeneous coordinates are written as the numbers 
(10.14) with all possible increasing collections of indices taking the values 1, ...,7. 
The space P’—! is the union of affine subsets U; 1,.aim> each of which is defined by 
the condition p;,,. + 0 for some choice of indices i, ..., im. From this we obtain 


solm 


Gim,n)= (J (Gon, n) NUiy,...8n)- 


UL ,--5im 


We shall investigate separately one of these subsets G(m,n)M Uj, ,...i,,, for exam- 
ple, for simplicity, the subset with indices (i1,...,im) = C(,...,m). The general 
case is considered completely analogously and differs only in the numeration of the 
coordinates in the space P’~!. We may assume that for points of our affine subset 
—_ m is equal to 1. 

Relationships (10.12) give the possibility to choose Plticker coordinates (10.14) 
of the subspace M (or equivalently, the minors M;j,,.._i,, of the matrix (10.1)) in the 
form of polynomials in coordinates pj,,.__,i,,, Such that among the indices i} < iz < 
+++ < dm, not more than one exceeds m. Any such collection of indices obviously 
has the form (1,...,7,...,m,1), where r < m and/ > m. Let us denote the Pliicker 
coordinate corresponding to this collection by p,;, that is, we set D,j = Pi. ¥ mI 

Let us consider an arbitrary ordered collection jj < jz <---+ < jm of numbers 
between | and n. If the indices j; are less than or equal to m for all k = 1,...,m, 
then the collection (j, j2,..., jm) coincides with the collection (1, 2,...,m), and 
since the Pliicker coordinate p1,....m 1s equal to 1, there is nothing to prove. Thus we 
have only to consider the remaining case. 

Let jx > m be one of the numbers jj < j2 <--- < jm. Let us use relationship 
(10.12), corresponding to the collection (j1,..., Tks ..+,Jm) Of m — 1 numbers and 
the collection (1,...,™m, jx) of m+ 1 numbers. In this case, relationship (10.12) 


10.2 The Pliicker Relations and the Grassmannian 357 


assumes the form 


m 


_4)r 7 : % . _yymt+l 7 _ 
ie Pi opaske Daan? OY Peete 


r=1 
since P},....m = 1. In view of the antisymmetry of the expression pj,,___, ;,,, it follows 
that Pj... im = Fe oe ae is equal to the sum (with alternating signs) of the 
products | jimr Pri If among the numbers jj, ..., jm there were s numbers 
exceeding m, then among the numbers jj,..., Tks ..-;Jm, there would be already 


s — | of them. 

Repeating this process as many times as necessary, we will obtain as a result an 
expression of the chosen Pliicker coordinate pj,,..._;,, in terms of the coordinates 
Pry’ <m,l>m. We have thereby obtained the following important result. 


Theorem 10.6 For each point in the set G(m,n) QU}... m, all the Pliicker coordi- 
nates (10.14) are polynomials in the coordinates PD, = P,...,#,....mJ>1 <m,l>m. 


Since the numbers r and / satisfy 1 <r <m and m </ <n, it follows that all 
possible collections of coordinates p,; form an affine subspace V of dimension 
m(n —m). By Theorem 10.6, all the remaining Pliicker coordinates p;,,i,, are 
polynomials in p,;, and therefore the coordinates p,.. uniquely define a point of the 
set G(m,n) 1 U\,....m. Thus is obtained a natural bijection (given by these polyno- 
mials) between points of the set G(m,n)M U},....m and points of the affine space V 
of dimension m(n — m). Of course, the same is true as well for points of any other 
set G(m,n)M U;,,...i,,- In algebraic geometry, this fact is expressed by saying that 
the Grassmannian G(m, 7) is covered by the affine space of dimension m(n — m). 


Theorem 10.7 Every point of the Grassmannian G(m,n) corresponds to some m- 
dimensional subspace M C L as described in the previous section. 


Proof Since the Grassmannian G(m,n) is the union of sets G(m,n)N Uj, i> it 
suffices to prove the theorem for each set separately. We shall carry out the proof 
for the set G(m,n) U},....m, since the rest differ from it only in the numeration of 
coordinates. 

Let us choose an m-dimensional subspace M C L and basis a@j,...,@m in it so 
that in the associated matrix M given by formula (10.1), the elements residing in its 
first m columns take the form of the identity matrix E of order m. Then the matrix 
M has the form 


1 0 +++ O aimt1 ++ Gn 
O 1 +++ O domz1 +++ Gon 

M=|.... . ; ‘ a | (10.15) 
O O ++) Lo amm4i +++) Gnn 


By Theorem 10.6, the Pliicker coordinates (10.14) are polynomials in p,; = 
P\....,,....m,1- Moreover, by the definition of Pliicker coordinates (10.4), we have 


358 10 The Exterior Product and Exterior Algebras 


Phoatsoumd = My,...,..md- Here, in the rth row of the minor My... ;,...m of the 
matrix (10.15), all elements are equal to zero, except for the element in the last (/th) 
column, which is equal to a,;;. Expanding the minor M)__. ;, my along the rth row, 
we see that it is equal to (—1)"*"a,;. In other words, D,; = (—1)"*!ay1. 

By our construction, all elements a,; of the matrix (10.15) can assume arbitrary 
values by the choice of a suitable subspace M C L and basis a,..., a, in it. Thus 
the Pliicker coordinates p,, also assume arbitrary values. It remains to observe that 
by Theorem 10.6, all remaining Pliicker coordinates are polynomials in p,), and 
consequently, for the constructed subspace M, they determine the given point of the 
set Gm, n) OU... m. 


10.3 The Exterior Product 


Now we shall attempt to understand the sense in which the subspace M C L is related 
to its Pliicker coordinates, after separating out those parts of the construction that 
depend on the choice of bases e1,...,@,) in L and aj,...,@, in M from those that 
do not depend on the choice of basis. 

Our definition of Pliicker coordinates was connected with the minors of the ma- 
trix M given by formula (10.1), and since minors (like all determinants) are multilin- 
ear and antisymmetric functions of the rows (and columns), let us begin by recalling 
the appropriate definitions from Sect. 2.6 (especially because now we shall need 
them in a somewhat changed form). Namely, while in Chap. 2, we considered only 
functions of rows, now we shall consider functions of vectors belonging to an arbi- 
trary vector space L. We shall assume that the space L is finite-dimensional. Then 
by Theorem 3.64, it is isomorphic to the space of rows of length n = dimL, and so 
we might have used the definitions from Sect. 2.6. But such an isomorphism itself 
depends on the choice of basis in the space L, and our goal is precisely to study the 
dependence of our construction on the choice of basis. 


Definition 10.8 A function F(x1,...,Xm) in m vectors of the space L taking nu- 
meric values is said to be multilinear if for every index i in the range 1 to m and 
arbitrary fixed vectors a1,...,4j,...,@m, 


F(a, . ++, @j-1,X7, @j+1, oe ., Am) 
is a linear function of the vector x;. 


For m = 1, we arrive at the notion of linear function introduced in Sect. 3.7, and 
for m = 2, this is the notion of bilinear form, introduced in Sect. 6.1. 

The definition of antisymmetric function given in Sect. 2.6 was valid for every 
set, and in particular, we may apply it to the set of all vectors of the space L. Ac- 
cording to this definition, for every pair of distinct indices r and s in the range | to 
m, the relationship 


F(X], 066 X py eee X gy ee Xm) = P(X, «2. Kg, ee Xp eee Xm) (10.16) 


10.3. The Exterior Product 359 


must be satisfied for every collection of vectors x1,...,X € L. As proved in 
Sect. 2.6, it suffices to prove property (10.16) for s =r + 1, that is, a transposi- 
tion of two neighboring vectors from the collection x1, ...,X is performed. Then 


property (10.16) will also be satisfied for arbitrary indices r and s. In view of this, 
we shall often formulate the condition of antisymmetry only for “neighboring” in- 
dices and use the fact that it then holds for two arbitrary indices r and s. 

If these numbers are elements of a field of characteristic different from 2, then it 
follows that F(x1,...,%m) =0 if any two vectors x¥1,..., Xm coincide. 

Let us denote by 7” (L) the collection of all multilinear functions of m vectors of 
the space L, and by 2”"(L) the collection of all antisymmetric functions in J7”(L). 
The sets J7’’(L) and 2(L) become vector spaces if for all F, G € [7 (L) we define 
theirsum H = F+ Ge II" (L) by the formula 


A(x1,...,Xm) = F(X1,...,Xm) + G(Xq,..-,Xm) 


and define for every function F € J7”(L) the product by the scalar @ as the function 
H =aF € TI" (L) according to the formula 


A(x],...,Xm) =aF(xX1,...,Xm). 


It directly follows from these definitions that J7” (L) is thereby converted to a vector 
space, and (2”"(L) Cc IT’ (L) is a subspace of [7 (L). 

Let dimL = n, and let e;,...,e, be some basis of the space L. It follows from 
the definition that the multilinear function F(x 1,...,X,) 1s defined for all collec- 
tions of vectors (x1,...,X,m) if it is defined for those collections whose vectors x; 
belong to our basis. Indeed, repeating the arguments from Sect. 2.7 verbatim that we 
used in the proof of Theorem 2.29, we obtain for F(x1,...,%m) the same formu- 
las (2.40) and (2.43). Thus for the chosen basis e1,..., é,, the multilinear function 
F(x 1,...,%m) is determined by its values F(e;,,..., @;,,), Where i1,...,im are all 
possible collections of numbers from the set N, = {1,..., 7}. 

The previous line of reasoning shows that the space J7’(L) is isomorphic to 
the space of functions on the set N?” = N,, x --- x N, (m-fold product). It follows 
that the dimension of the space /7’"(L) is finite and coincides with the number of 
elements of the set N?”. It is easy to verify that this number is equal to n’”, and so 
dim 7” (L) =n”. 

As we observed in Example 3.36 (p. 94), in a space of functions f on a finite 
set N’”, there exists a basis consisting of 5-functions assuming the value | on one 
element of N?” and the value 0 on all the other elements (p. 94). In our case, we shall 
introduce a special notation for such a basis. Let J = (i1,..., im) be an arbitrary 
element of the set N/”. Then we denote by f, the function taking the value 1 at the 
element J and the value 0 on all remaining elements of the set N?”. 

We now move on to an examination of the subspace of antisymmetric multilinear 
functions §2” (L), assuming as previously that there has been chosen in L some basis 
€1,..-,@n. To verify that a multilinear function F is antisymmetric, it is necessary 
and sufficient that property (10.16) be satisfied for the vectors e; of the basis. In 


360 10 The Exterior Product and Exterior Algebras 


other words, this reduces to the relationships 
PGi yep ceGOhias cg Ci) = —P (Cis nis Cit 5 Cig aegis) 


for all collections of vectors e;,,...,é;,, in the chosen basis e},...,@, of the 
space L. Therefore, for every function F € 2 (L) and every collection (j1,..., 
Jm) € N’", we have the equality 


n? 


F(@j,,--+5€ jy, ) =EF (iy, -. 5 Cin) (10.17) 


where the numbers 7, ..., 7%, are the same as jj,..., jm, but arranged in ascending 
order ij < iz <--+ <i, while the sign + or — in (10.17) depends on whether the 
number of transpositions necessary for passing from the collection (i1,..., im) to 
the collection (j1,..., jm) is even or odd (we note that if any two of the numbers 
Ji;+-+» Jm are equal, then both sides of equality (10.17) become equal to zero). 
Reasoning just as in the case of the space [7 (L), we conclude that the space 


£2’ (L) is isomorphic to the space of functions on the set N 7” Cc N’”, which consists 
of all increasing sets I = (i1,..., im), that is, those for which i) < iz <--- <im. 
From this it follows in particular that 2’"(L) = (0) if m > n. It is easy to see that 


the number of such increasing sets I is equal to C*”, and therefore, 
dim 2” (L) = C7”. (10.18) 


We shall denote by F7 the 5-function of the space 2” (L), taking the value | on the 


> > 
set I ¢ N?” and the value 0 on all the remaining sets in N?”. 

The vectors a1,...,@m € L determine on the space §2”"(L) a linear function @ 
given by the relationship 


o(F) = F(a,...,4m) (10.19) 


for an arbitrary element F € §2”(L). Thus g is a linear function on §2”(L), that is, 
an element of the dual space 2” (L)*. 


Definition 10.9 The dual space A’”(L) = 2’(L)* is called the space of m-vectors 
or the mth exterior power of the space L, and its elements are called m-vectors. 
A vector g € A” (L) constructed with the help of relationship (10.19) involving the 
vectors @,...,@m is called the exterior product (or wedge product) of a\,...,Am 
and is denoted by 


Q=a, Aa2.NA-*- Aan. 


Now let us explore the connection between the exterior product and Pliicker co- 
ordinates of the subspace M C L. To this end, it is necessary to choose some basis 
€1,...,@, inLand some basis aj, ..., @, in M. The Pliicker coordinates of the sub- 
space M take the form (10.4), where M;,__.,i,, is the minor of the matrix (10.1) that 
resides in columns 7;,..., 7 and is an antisymmetric function of its columns. Let 
us introduce for the Pliicker coordinates and associated minors the notation 


> 
- 
PI = Pij,..sim> My = Mi,....,in» Where I = (ij,...,im) € Ny. 


10.3. The Exterior Product 361 


To the basis of the space 2” (L) consisting of 5-functions F7, there corresponds 
the dual basis, of the dual space A” (L), whose vectors we shall denote by g;. Using 
the notation that we introduced in Sect. 3.7, we may say that the dual basis is defined 
by the condition 


(Fr,9,)=1. forallle N™ 


n? 


(Fr,9,;)=0 forallI#J. — (10.20) 


In particular, the vector gp = a, Aa2 A---A@m of the space A” (L) can be expressed 
as a linear combination of vectors in this basis: 


y= > AIO] (10.21) 


Nin 
Te N? 


with certain coefficients 47. Using formulas (10.19) and (10.20), we obtain the fol- 
lowing equality: 


Ar = Q(F1) = Fr(a1,..., 4m). 


For determining the values F7 (a1, ...,@m), we may make use of Theorem 2.29; 
see formulas (2.40) and (2.43). Since F7(e;,,...,@j,,) = 0 when the indices of 
€j,,---,;, form the collection J ¢ I, then from formula (2.43), it follows that 
the values F7(a,,...,@m) depend only on the elements appearing in the minor 
My,. The minor M7 is a linear and antisymmetric function of its rows. In view of 
the fact that by definition, F7(e;,,..., @:,,) = 1, we obtain from Theorem 2.15 that 
F{(a,...,@m) = M7 = py. In other words, we have the equality 


Y =a Aa2A---Aam= >> Migp= >> prior: (10.22) 


= => 
IeNM TeN? 


Thus any collection of m vectors aj,...,@m uniquely determines the vector 
a, A-++A@m in the space A™(L), where the Pliicker coordinates of the subspace 
(a1, ...,@m) are the coordinates of this vector aj A---Aam with respect to the basis 
ae Nm, of the space A”(L). Like all coordinates, they depend on this basis, 
which itself is constructed as the dual basis to some basis of the space §2”" (L). 


Definition 10.10 A vector x € A” (L) is said to be decomposable if it can be repre- 
sented as an exterior product 


X=a{AQ0A:::AQn (10.23) 


with some @],...,@m €L. 


Let the m-vector x have coordinates x;,,...,i,, in some basis g,, I € Nm, of the 
space A’”(L). As in the case of an arbitrary vector space, the coordinates x;,,...i,, 
can assume arbitrary values in the associated field. In order for an m-vector x to 
be decomposable, that is, that it satisfy the relationship (10.23) with some vectors 


362 10 The Exterior Product and Exterior Algebras 


@,...,@m €L, it is necessary and sufficient that its coordinates x;, _;,, coincide 
with the Pliicker coordinates p;,,..,i,, of the subspace M = (a1,...,@m) in L. But 
as we established in the previous section, the collection of Pliicker coordinates of 
a subspace M C L cannot be an arbitrary collection of v numbers, but only one 
that satisfies the Pliicker relations (10.12). Consequently, the Pliicker relations give 
necessary and sufficient conditions for an m-vector x to be decomposable. 

Thus for the specification of m-dimensional subspaces M C L, we need only 
the decomposable m-vectors (the indecomposable m-vectors correspond to no m- 
dimensional subspace). However, generally speaking, the decomposable vectors do 
not form a vector space (the sum of two decomposable vectors might be an inde- 
composable vector), and also, as is easily verified, the set of decomposable vectors 
is not contained in any subspace of the space A’”(L) other than A’ (L) itself. In 
many problems, it is more natural to deal with vector spaces, and this is the reason 
for introducing the notion of a space A” (L) that contains all m-vectors, including 
those that are indecomposable. 

Let us note that the basis vectors g; themselves are decomposable: they are de- 
termined by the conditions (10.20), which, as is easily verified, taking into account 
equality (Fy, @;) = Fy(éi,,..-, €i,,), means that for a vector x = g7, we have the 
representation (10.23) for aj = @;,,...,@m = @i,,, that is, 


Oy =ei, NCig N***A Gi,, T= (iy,...,im). 


If e1,...,@, iS a basis of the space L, then the vectors e;, A --- A e;,, for all 
possible increasing collections of indices (i), ..., im) form a basis of the subspace 
A™(L), dual to the basis F7 of the space 2’"(L) that we considered above. Thus 
every m-vector is a linear combination of decomposable vectors. 

The exterior product aj A---A a is a function of m vectors a; € L with values in 
the space A” (L). Let us now establish some of its properties. The first two of these 
are an analogue of multilinearity, and the third is an analogue of antisymmetry, but 
taking into account that the exterior product is not a number, but a vector of the 
space A” (L). 


Property 10.11 For every i € {1,...,m} and all vectors a;, b,c € L the following 
relationship is satisfied: 


a, A+++ Aaj) A (D+ 6) A Gj4, A+++ A am 
=A, A+: Aaj, ABA j41 A+++ A Am 
+a, A+++ Aaj-1 ACA Gj41 A+++ AG. (10.24) 
Indeed, by definition, the exterior product 
a, A+++ Aaj-1 A (B+) Aaj41 A+++ Aan 


is a linear function on the space §2”"(L) associating with each function F € 2” (L), 
the number F(a,,...,aj-1,b + ¢, 4j41,...,@m). Since the function F is multilin- 


10.3. The Exterior Product 363 
ear, it follows that 


F(a,,...,@j-1,b +, @j41,...,4m) 


= F(aj,...,@j-1,b, @j41,...,€m) + F(ai,..., @i-1, €, @i41,.--,Am), 


which proves equality (10.24). 
The following two properties are just as easily verified. 


Property 10.12 For every number q@ and all vectors a; € L, the following relation- 
ship holds: 


a, A+++ Aaj} A (Aaj) A Aj41 A+++ A Am 


=A(A1 A+++ A Aj-1 AGj A Qj41 A+++ Am). (10.25) 


Property 10.13 For all pairs of indices r,s € {1,...,m} and all vectors a; € L, the 
following relationship holds: 


A, N-+++NAs-1 NAs NGAs41 N+ NAyp-| NAy AN Apt] A+++ NAn 
=-A,N-+-NAs-1 NA, NAs41 A-:- 


A Ar—-| AAs A Ay41 A+++ AQAn, (10.26) 


that is, if any two vectors from among a1, ..., @ change places, the exterior prod- 
uct changes sign. 


If (as we assume) the numbers are elements of a field of characteristic different 
from 2 (for example, R or C), then Property 10.13 yields the following corollary. 


Corollary 10.14 If any two of the vectors a,,...,@m are equal, then ay \--- A 
An = 9. 


Generalizing the definition given above, we may express Properties 10.11, 10.12, 


and 10.13 by saying that the exterior product aj A---A@ is a multilinear antisym- 
metric function of the vectors a;,...,@,, € L taking values in the space A”™(L). 


Property 10.15 Vectors a1, ..., @m are linearly dependent if and only if 

ai A-+-Aan =9. (10.27) 
Proof Let us assume that the vectors a,,...,@ are linearly dependent. Then one 
of them is a linear combination of the rest. Let it be the vector a,, (the other cases 


are reduced to this one by a change in numeration). Then 


Am = 01a, +--+ +Am—1Am-1, 


364 10 The Exterior Product and Exterior Algebras 
and on the basis of Properties 10.11 and 10.12, we obtain that 


a, A+++ AQAn—| A am 


=] (A1 A+++ A Am—| NG]) + +++ + Qm—1(A1 A+++ A Am—| A am-1). 


In view of Corollary 10.14, each term on the right-hand side of this equality is equal 
to zero, and consequently, we have aj A---A am =0. 

Let us assume now that the vectors a,,...,a,, are linearly independent. We 
must prove that aj A--- A a, #90. Equality (10.27) would mean that the function 
a, A+++ A @m (as an element of the space A”(L)) assigns to an arbitrary function 
Fe (L), the value F(a,,...,am) = 0. However, in contradiction to this, it is 
possible to produce a function F € 2”(L) for which F(a,,...,a@m) 4 0. Indeed, 
let us represent the space L as a direct sum 


L= (a1, ++; Qm) @eL, 
where L’ C L is some subspace of dimension n — m, and for every vector z € L, let 
us consider the corresponding decomposition z = x + y, where x € (a1,..., Am) 


and y € L’. Finally, for vectors 


Zj =QjjA, +++: +Ojimam +y;, yy, EU,i=l,...,m, 


let us define a function F by the condition F(z,...,Zm) = |(aij)|. AS we saw 
in Sect. 2.6, the determinant is a multilinear antisymmetric function of its rows. 
Moreover, F (a1, ...,@m) = |E| = 1, which proves our assertion. 


Let L and M be arbitrary vector spaces, and let A :L— M be a linear transforma- 
tion. It defines the transformation 


QP (A): 2Q?(M) > QP (L), (10.28) 

which assigns to each antisymmetric function F(y;,..., y,) in the space 2?(M), 
an antisymmetric function G(x|,...,«) in the space 82? (L) by the formula 

G(X1,...,Xp) = F(A(x1), ee A(X p)), X1,...,Xp, Eb. (10.29) 


A simple verification shows that this transformation is linear. Let us note that we 
have already met with such a transformation in the case m = 1, namely the dual 
transformation A* : M* — L* (see Sect. 3.7). In the general case, passing to the dual 
spaces A?(L) = Q?(L)* and A?(M) = 2?(M)*, we define the linear transformation 


A? (A): AP(L) > A?(M), (10.30) 


dual to the transformation (10.28). 
Let us note the most important properties of the transformation (10.30). 


10.3. The Exterior Product 365 


Lemma 10.16 Let A:L— Mand 8:M—N be linear transformations of arbi- 
trary vector spaces L, M,N. Then 


A? (BA) = A? (B) A” (A). 


Proof In view of the definition (10.30) and the properties of dual transformations 
(formula (3.61)) established in Sect. 3.7, it suffices to ascertain that 


2? (BA) = Q?(A)Q?(B). (10.31) 


But equality (10.31) follows directly from the definition. Indeed, the transforma- 
tion 22?(.A4) maps the function F(y,,---,¥p) in the space 9P(M) to the func- 
tion G(x1,...,Xp) in 2?(L) by formula (10.29). In just the same way, the trans- 
formation 2?(8) maps the function H(z\,...,Zp) in &2?(N) to the function 
F(yy,---, Yp) in §2?(M) by the analogous formula 


F(¥1,---¥p) = A(B1),---1 BY p))s Vases Vp EM. (10.32) 


Finally, the transformation B.A :L— N takes the function H(z1,..., Zp) in the 
space §2?(N) to the function G(x1,..., x») in the space 2?(L) by the formula 


G(x1,...,Xp) = H(BA()),...,BAp)),  X1,..., Xp EL. (10.33) 


Substituting into (10.33) the vector y; = A(x;) and comparing the relationship thus 
obtained with (10.32), we obtain the required equality (10.31). 


Lemma 10.17 For all vectors x\,...,X p € L, we have the equality 
AP(A)(X1 A+++ AXp) = A(X1) A+ A A(X p). (10.34) 


Proof Both sides of equality (10.34) are elements of the space A?(M) = 2?(M)*, 
that is, they are linear functions on (2?(M). It suffices to verify that their applica- 
tion to any function F(y;,...,y,) in the space §2?(M) gives one and the same 
result. But as follows from the definition, in both cases, this result is equal to 
F(A(X1),..-, A(X p)). 


Finally, we shall prove a property of the exterior product that is sometimes called 
universality. 


Property 10.18 Any mapping that carries a vector [aj,...,@m] of some space M 
satisfying Properties 10.11, 10.12, 10.13 (p. 362) to m vectors a),...,@m of the 
space L can be obtained from the exterior product a; A --- A adm by applying some 
uniquely defined linear transformation A: A™(L) > M. 


In other words, there exists a linear transformation A: _A™(L) — M such that for 
every collection a;,...,@m of vectors of the space L, we have the equality 


[a], ..-,€m] = A(a, A+++ Aan), (10.35) 


366 10 The Exterior Product and Exterior Algebras 


which can be represented by the following diagram: 


a” M (10.36) 


Je 


Am (L) 


In this diagram, [a),...,@m]= A(a1 A--- Aan). 
Let us note that although L” =L x --- x L (m-fold product) is clearly a vector 
space, we by no means assert that the mapping 


Q,..-,Am'+> [a],..., am] 


discussed in Property 10.18 is a linear transformation L” — M. In general, such is 
not the case. For example, the exterior product a; A--- A a,:L" > A™(L) itself 
is not a linear transformation in the case that dimL > m+ 1 and m > 1. Indeed, the 
image of the exterior product is the set of decomposable vectors described by their 
Pliicker relations, which is not a vector subspace of A” (L). 


Proof of Property 10.18 We can construct a linear transformation YW : M* > Q’(L) 
such that it maps every linear function f € M* to the function W(f) € 2 (L) de- 
fined by the relationship 


W(f) =f (la1,..-, aml). (10.37) 


By Properties 10.11—10.13, which, by assumption, are satisfied by [a,,...,@m], 
the mapping Y(f) thus constructed is a multilinear and antisymmetric function of 
a|,...,Am. Therefore, Y : M* — 2’ (L) is a linear transformation. Let us define A 
as the dual mapping 


A=W: AML) =2™(L* — M=M™. 


By definition of the dual transformation (formula (3.58)), for every linear func- 
tion F on the space 2 (L), its image A(F) is a linear function on the space M* 
such that A(F)(f) = F(W(f)) for all f € M*. Applying formula (10.37) to the 
right-hand side of the last equality, we obtain the equality 


A(F)(f) = F(W(f)) = F(f (a1, ---,4ml)). (10.38) 


Setting in (10.38) the function F(W) = W(aq,..., am), that is, F =a, A--- Aap, 
we arrive at the relationship 


Aa, A+++ A Gm)(f) = f ([a1,---,4ml), (10.39) 


10.4 Exterior Algebras* 367 


whose left-hand side is an element of the space M**, which is isomorphic to M. 

Let us recall that the identification (isomorphism) of the spaces M** and M can 
be obtained by mapping each vector w(f) € M*™* to the vector x € M for which the 
equality f(x) = w(f) is satisfied for every linear function f € M*. Then formula 
(10.39) gives the relationship 


f(A(ai A+++ AGm)) =f ([a1,...,aml]), 


which is valid for every function f € M*. Consequently, from this we obtain the 
required relationship 


A(aj A---A@m) =[a1,..., 4m]. (10.40) 


Equality (10.40) defines a linear transformation for all decomposable vec- 
tors x € A”(L). But above, we saw that every m-vector is a linear combina- 
tion of decomposable vectors. The transformation A is linear, and therefore, it is 
uniquely defined for all m-vectors. Thus we obtain the required linear transforma- 
tion A: A” (L) > M. 


10.4 Exterior Algebras* 


In many branches of mathematics, an important role is played by the expression 
aj A:::Aadn, 


understood not so much as a function of m vectors a1, ...,@m of the space L with 
values in A’’(L), but more as the result of repeated (m-fold) application of the op- 
eration consisting in mapping two vectors x € A?(L) and y € A‘%(L) to the vector 
x A y € AP*4(L), For example, the expression a A b A c can then be calculated 
“by parts.” That is, it can be represented in the form a A b A c= (aA b) Ac and 
computed by first calculating a \ b, and then (a Ab) Ac. 

To accomplish this, we have first to define the function mapping two vectors x € 
A?(L) and y € A‘ (L) to the vector x A y € A?T4(L). As a first step, such a function 
x A y will be defined for the case that the vector y € A7(L) is decomposable, that 
is, representable in the form 


YHA, AAQ2N:*-AQg, Gel. (10.41) 


Let us consider the mapping that assigns to p vectors bj,...,b» of the space L 
the vector 


[bi,...,Bp) =D A---Abp ANG, A+++ Aag, 


368 10 The Exterior Product and Exterior Algebras 


and let us apply to it Property 10.18 (universality) from the previous section. We 
thereby obtain the diagram 


AP AP*4(L) (10.42) 


Ai 


AP(L) 


In this diagram, 

A(b) A---Ab,) =[bj,..., bp]. 
Definition 10.19 Let y be a decomposable vector, that is, it can be written in the 
form (10.41). Then for every vector x € A?(L), its image A(x) for the transforma- 


tion A: A?(L) > A?t4(L) constructed above is denoted by x \y=xA(a,A---A 
a,) and is called the exterior product of vectors x and y. 


Thus as a first step, we defined x A y in the case that the vector y is de- 
composable. In order to define x A y for an arbitrary vector y € A‘(L), it suf- 
fices simply to repeat the same argument. Indeed, let us consider the mapping 
[a1,...,@g]: AT(L) > A?*4(L) defined by the formula 


[a1,...,€@g] =x A (A, A---A QQ). 


We again obtain, on the basis of Property 10.18, the same diagram: 


Af AP+4(L) (10.43) 


a. 


ATL) 


Ld 


where the transformation 4: A?(L) > A?T4(L) is defined by the formula 
A(a, A-++A dg) =[a1,..., aq]. 


Definition 10.20 For any vectors x € A?(L) and y € A7(L), the exterior product 
x A y is the vector A(y) € A?t4(L) in diagram (10.43) constructed above. 


10.4 Exterior Algebras* 369 


Let us note some properties of the exterior product that follow from this defini- 
tion. 


Property 10.21 For any vectors x1,x2 € A?(L) and y € A4(L), we have the rela- 
tionship 
(x; +x2)AyHxX1Ay+x2Ay. 


Similarly, for any vectors x € A?(L) and y € A%(L) and any scalar w, we have the 
relationship 


(ax)Ay=a(xAy). 
Both equalities follow immediately from the definitions and the linearity of the 
transformation A in diagram (10.43). 
Property 10.22 For any vectors x € A?(L) and y;, y> € A4(L), we have the rela- 
tionship 
XA(Y{ + YZ) HX AY, +XA Yo. 


Similarly, for any vectors x € A?(L) and y € A%(L) and any scalar w, we have the 
relationship 


xA(ay)=a(xAy). 


Both equalities follow immediately from the definitions and the linearity of the 
transformations A in diagrams (10.42) and (10.43). 


Property 10.23 For decomposable vectors x = a; A--- Aap and y=b, A---Abg, 
we have the relationship 


XA Y=, A-*-AA, AD A:--Ndg. 
This follows at once from the definition. 


Let us note that we have actually defined the exterior product in such a way 
that Properties 10.21—10.23 are satisfied. Indeed, Property 10.23 defines the exterior 
product of decomposable vectors. And since every vector is a linear combination of 
decomposable vectors, it follows that Properties 10.21 and 10.22 define it in the gen- 
eral case. The property of universality of the exterior product has been necessary for 
verifying that the result x A y does not depend on the choice of linear combinations 
of decomposable vectors that we use to represent the vectors x and y. 

Finally, let us make note of the following equally simple property. 


Property 10.24 For any vectors x € A?(L) and y € A4(L), we have the relationship 


xAy=(—-lDy Ax. (10.44) 


370 10 The Exterior Product and Exterior Algebras 


Both vectors on the right- and left-hand sides of equality (10.44) belong to the space 
A?*t4(L), that is, by definition, they are linear functions on 2?*4(L). Since every 
vector is a linear combination of decomposable vectors, it suffices that we verify 
equality (10.44) for decomposable vectors. 

Let x =a, A--- Aap, y=), A--+ A bg, and let F be any vector of the space 
@Q?*4(L), that is, F is an antisymmetric function of the vectors x;,..., p+q inL. 
Then equality (10.44) means that 


F(a1,...,@p,B1,...,bg) = (1) F (bt, bg, 1, «+s Qp)- (10.45) 


But equality (10.45) is an obvious consequence of the antisymmetry of the func- 
tion F’. Indeed, in order to place the vector b, in the first position on the left-hand 
side of (10.45), we must change the position of b; with each vector a,...,ap 
in turn. One such transposition reverses the sign, and altogether, the transpositions 
multiply F by (—1)?. Similarly, in order to place the vector bz in the second posi- 
tion on the left-hand side of (10.45), we also must execute p transpositions, and the 
value of F is again multiplied by (— 1)”. And in order to place all vectors bj, ..., bg 
at the beginning, it is necessary to multiply F by (—1)? a total of q times, and this 
ends up as (10.45). 


Our next step consists in uniting all the sets A?(L) into a single set A(L) and 
defining the exterior product for its elements. Here we encounter a special case of a 
very important algebraic notion, that of an algebra.” 


Definition 10.25 An algebra (over some field K, which we shall consider to consist 
of numbers) is a vector space A on which, besides the operations of addition of 
vectors and multiplication of a vector by a scalar, is also defined the operation A x 
A— A, called the product, assigning to every pair of elements a, b € A the element 
ab € A and satisfying the following conditions: 


(1) the distributive property: for all a, b, c € A, we have the relationship 
(a+b)c=ac+be, c(a+b)=ca+cb; (10.46) 
(2) for all a, b € A and every scalar w € K, we have the relationship 
(aa)b =a(ab) =a(ab); (10.47) 


(3) there exists an element e € A, called the identity, such that for every a € A, we 
have ea = a and ae =a. 


Let us note that there can be only one identity element in an algebra. Indeed, 
if there existed another identity element e’, then by definition, we would have the 
equalities ee’ = e’ and ee’ = e, from which it follows that e = e’. 


2This is not a very felicitous term, since it coincides with the name of a branch of mathematics, the 
one we are currently studying. But the term has taken root, and we are stuck with it. 


10.4 Exterior Algebras* 371 


As in any vector space, in an algebra we have, for every a € A, the equality 
0-a=0 (here the 0 on the left denotes the scalar zero in the field IK, while the 0 on 
the right denotes the null element of the vector space A that is an algebra). 

If an algebra A is finite-dimensional as a vector space and e,..., @, is a basis of 
A, then the elements e),...,@, are said to form a basis of the algebra A, where the 
number n is called its dimension and is denoted by dimA = n. For an algebra A of 
finite dimension n, the product of two of its basis elements can be represented in the 
form 


n 
eej =) are, a eee (10.48) 
k=1 


where arf. € K are certain scalars. 


The totality of all scalars oui, for alli, j,k =1,...,n is called the multiplication 
table of the algebra A, and it uniquely determines the product for all the elements 
of the algebra. Indeed, if x = A,e; +--+ +Anen and y = ye) +--+ nen, then 
repeatedly applying the rules (10.46) and (10.47) and taking into account (10.48), 
we obtain 


n 
xy= D> Amjorier, (10.49) 
i,j,k=1 


that is, the product x y is uniquely determined by the coordinates of the vectors x, y 
and the multiplication table of the algebra A. And conversely, it is obvious that for 
any given multiplication table, formula (10.49) defines in an n-dimensional vector 
space an operation of multiplication satisfying all the requirements entering into the 
definition of an algebra, except, perhaps, property 3, which requires further consid- 
eration; that is, it converts this vector space into an algebra of the same dimension n. 


Definition 10.26 An algebra A is said to be associative if for every collection of 
three elements a, b, and c, we have the relationship 


(ab)c =a(be). (10.50) 


The associative property makes it possible to calculate the product of any num- 
ber of elements aj,...,@,, of an algebra A without indicating the arrangement of 
parentheses among them; see the discussion on p. xv. Clearly, it suffices to verify 
the associative property of a finite-dimensional algebra for elements of some basis. 

We have already encountered some examples of algebras. 


Example 10.27 The algebra of all square matrices of order n. It has the finite di- 
mension n’, and as we saw in Sect. 2.9, it is associative. 


Example 10.28 The algebra of all polynomials in n > 0 variables with numeric 
coefficients. This algebra is also associative, but its dimension is infinite. 


372 10 The Exterior Product and Exterior Algebras 


Now we shall define for a vector space L of finite dimension n its exterior algebra 
A(L). This algebra has many different applications (some of them will be discussed 
in the following section); its introduction is one more reason why in Sect. 10.3, we 
did not limit our consideration to decomposable vectors only, which were sufficient 
for describing vector subspaces. 

Let us define the exterior algebra A(L) as a direct sum of spaces A?(L), p > 0, 
which consist of more than just the one null vector, where A°(L) is by definition 
equal to K. Since as a result of the antisymmetry of the exterior product we have 
AP (L) = (0) for all p > n, we obtain the following definition of an exterior algebra: 


A(L) = A°(L) @ AL) @- + ® A"(L). (10.51) 


Thus every element u of the constructed vector space A(L) can be represented in 
the form u = up + u,; +---+ uy, where u; € A'(L). 

Our present goal is the definition of the exterior product in A(L), which we de- 
note by uw A v for arbitrary vectors u, v € A(L). We shall define the exterior product 
u A v of vectors 


U=UjtUyts+un, V=I+VI +--+, Uj,0; € A'(L), 


as the element 


n 
LRAVS= So uj A vj, 


1,j=0 


where we use the fact that the exterior product u; A vj; is already defined as an 
element of the space A‘+J(L). Thus 


uAv=wotwy,t-::-+wW,, Where wz = > uj Av;, we € AX(L). 
i+j=k 


A simple verification shows that for the exterior product thus defined, all the con- 
ditions for the definition of an algebra are satisfied. This follows at once from the 
properties of the exterior product x A y of vectors x € A‘(L) and y € A/(L) proved 
earlier. By definition, A°(L) = K, and the number | (the identity in the field K) is 
the identity in the exterior algebra A(L). 


Definition 10.29 A finite-dimensional algebra A is called a graded algebra if there 
is given a decomposition of the vector space A into a direct sum of subspaces A; C A, 


A=Ao0 ®Ai ®::: BAK, (10.52) 
and the following conditions are satisfied: for all vectors x € A; and y <€ Aj, the 


product xy isin Aj; ifit j<k,andxy =Oifi+ j >k. Here the decomposition 
(10.52) is called a grading. 


10.4 Exterior Algebras* 373 


In this case, dimA = dim Ap + --- + dimAg,, and taking the union of the bases of 
the subspaces A;, we obtain a basis of the space A. The decomposition (10.51) and 
the definition of the exterior product show that the exterior algebra A(L) is graded if 
the space L has finite dimension n. Since A?(L) = (0) for all p > 7, it follows that 


n n 
dim A(L) = > dim A?(L) = > a7 
p=0 p=0 


In an arbitrary graded algebra A with grading (10.52), the elements of the subspace 
A; are called homogeneous elements of degree i, and for every u € Aj, we write 
i = degu. One often encounters graded algebras of infinite dimension, and in this 
case, the grading (10.52) contains, in general, not a finite, but an infinite number 
of terms. For example, in the algebra of polynomials (Example 10.28), a grading is 
defined by the decomposition of a polynomial into homogeneous components. 
Property (10.44) of the exterior product that we have proved shows that in an ex- 
terior algebra A(L), we have for all homogeneous elements u and v the relationship 


uAv=(—-1)20Au, where d = degu deg v. (10.53) 


Let us prove that for every finite-dimensional vector space L, the exterior algebra 
A(L) is associative. As we noted above, it suffices to prove the associative property 
for some basis of the algebra. Such a basis can constructed out of homogeneous 
elements, and we may even choose them to be decomposable. Thus we may suppose 
that the elements a, b,c € A(L) are equal to 


a=a,A-:-Aap, b=b, A---Abg, C=Ci A+++ ACy, 
and in this case, using the properties proved above, we obtain 
AN(DAC) =A, A---AA,AD A+ Abg AC A+ ACG, = (ANd) Ae. 


An associative graded algebra that satisfies relationship (10.53) for all pairs of 
homogeneous elements is called a superalgebra. Thus an exterior algebra A(L) of 
an arbitrary finite-dimensional vector space L is a superalgebra, and it is the most 
important example of this concept. 

Let us now return to the exterior algebra A(L) of the finite-dimensional vector 
space L. Let us choose in it a convenient basis and determine its multiplication table. 

Let us fix in the space L an arbitrary basis e1,...,e,. Since the elements 
gr = ei, A+: Ae;,, for all possible collections J = (ij,...,im) in Ne form a 
basis of the space A’”(L), m > 0, it follows from decomposition (10.51) that a 
basis in A(L) is obtained as the union of the bases of the subspaces A”(L) for 
all m = 1,...,n and the basis of the subspace A°(L) = K, consisting of a sin- 
gle nonnull scalar, for example |. This means that all such elements g7, I € Ne, 
m=1,...,n, together with | form a basis of the exterior algebra A(L). Since the 


374 10 The Exterior Product and Exterior Algebras 


exterior product with | is trivial, it follows that in order to compose a multiplica- 
tion table in the constructed basis, we must find the exterior product g7; A gy for all 


possible collections of indices I € We and J € Nd for alll <p,g <n. 
In view of Property 10.23 on page 369, the exterior product gy A gy is equal to 


PIA Gy HC Ni NC, NC i, Ai A Cjy- (10.54) 


Here there are two possibilities. If the collections J and J contain at least one 
index in common, then by Corollary 10.14 (p. 363), the product (10.54) is equal to 
zero. 

If, on the other hand, JM J = ©, then we shall denote by K the collection in 
neta comprising the indices belonging to the set J U J, that is, in other words, K 
is obtained by arranging the collection (i1,...,ip, ji,..., jq) in ascending order. 
Then, as is easily verified, the exterior product (10.54) differs from the element 
OK, KE NP? a belonging to the basis of the exterior algebra A(L) constructed 
above in that the indices of the collection 7 U J are not necessarily arranged in 
ascending order. In order to obtain from (10.54) the element gx, K € WN? a. it is 
necessary to interchange the indices (i1,...,ip, j1,-++» jg) i such a way that the 
resulting collection is increasing. Then by Theorems 2.23 and 2.25 from Sect. 2.6 
and Property 10.13, according to which the exterior product changes sign under the 
transposition of any two vectors, we obtain that 


=> 
gr Ags =e, Jor, KeENP*, 


where the number ¢(J, J) is equal to +1 or —1 depending on whether the number 
of transpositions necessary for passing from (i1,...,ip, j1,---, jg) to the collection 


=> 
KeN Erg is even or odd. 
As a result, we see that in the constructed basis of the exterior algebra A(L), the 
multiplication table assumes the following form: 


ifINI#2, 


: . (10.55) 
ed, J)ox, if INJ=2. 


gr \Gl= 


10.5 Appendix* 


The exterior product x A y of vectors x € A?(L) and y € A%(L) defined in the 
previous section makes it possible in many cases to give simple proofs of assertions 
that we encountered earlier. 


Example 10.30 Let us consider the case p =n, using the notation and results of the 
previous section. As we have seen, dim A?(L) = C?, and therefore, the space A” (L) 
is one-dimensional, and each of its nonzero vectors constitutes a basis. If e is such 
a vector, then an arbitrary vector of the space A”(L) can be written in the form we 


10.5 Appendix* 375 


with a suitable scalar a. Thus for any n vectors x;,...,%, of the space L, we obtain 
the relationship 


Xp A+ AXn =A(X1,...,Xn)e, (10.56) 


where a(x1,...,X,) is some function of n vectors taking numeric values from the 
field K. By Properties 10.11, 10.12, and 10.13, this function is multilinear and anti- 
symmetric. 

Let us choose in the space L some basis e1,..., @, and set 


Xj =Xj10; +--+ Xinen, 7 eee 


The choice of a basis defines an isomorphism of the space L and the space K” of 
rows of length n, in which the vector x; corresponds to the row (x;1,..-, Xin). Thus 
a becomes a multilinear and antisymmetric function of n rows taking numeric val- 
ues. By Theorem 2.15, the function a(x1,...,%,) coincides up to a scalar multiple 
k(e) with the determinant of the square matrix of order n consisting of the coordi- 
nates x;; Of the vectors x1,..., Xn: 


X11 tts XMIn 
a(X1,...,Xn)=ke):| 2 et |. (10.57) 


Xnl o*** Xnn 


The arbitrariness of the choice of coefficient k(e) in formula (10.57) corresponds to 
the arbitrariness of the choice of basis e in the one-dimensional space A” (L) (let us 
recall that the basis e;,..., @, of the space L is fixed). 

In particular, let us choose as basis of the space A” (L) the vector 


C=C A+: Aln. (10.58) 


Vectors @1,...,@, are linearly independent. Therefore, by Property 10.15 (p. 363), 
the vector e is nonnull. We therefore obviously obtain that k(e) = 1. Indeed, since 
the coefficient k(e) in formula (10.57) is one and the same for all collections of vec- 
tors ¥1,...,X,, we can calculate it by setting x; = e;, i = 1,...,. Comparing in 
this case formulas (10.56) and (10.58), we see that a(e;,...,e,) = 1. Substituting 
this value into relationship (10.57) for x; = e;,i =1,...,n, and noting that the de- 
terminant on the right-hand side of (10.57) is the determinant of the identity matrix, 
that is, equal to 1, we conclude that k(e) = 1. 

Using definitions given earlier, we may associate the linear transformation 
A"(A): A"(L) > A”(L) with the linear transformation A :L— L. The transfor- 
mation A can be defined by indicating to which vectors x1, ...,X, it takes the basis 
€1,..-,@n of the space L, that is, by specifying vectors x; = A(e;),i=1,...,n. By 
Lemma 10.17 (p. 365), we have the equality 


A" (A)(€1 A+++ A €n) = A(e1) A+++ A Alen) 


HXLA--*AXn =a(X],..., Xn )e. (10.59) 


376 10 The Exterior Product and Exterior Algebras 


On the other hand, as we know, all linear transformations of a one-dimensional 
space have the form x +> ax, where a is some scalar equal to the determinant of 
the given transformation and independent of the choice of basis e in A”(L). Thus 
we obtain that (A”(.A))(x) = ax, where the scalar a is equal to the determinant 
|(A”(.A))| and clearly depends only on the transformation A itself, that is, it is 
determined by the collection of vectors x; = A(e;), i =1,...,n. It is not difficult 
to see that this scalar @ coincides with the function a(x,,...,xX,) defined above. 
Indeed, let us choose in the space A”(L) a basis e = ej A--- Ae,. Then the required 
equality follows directly from formula (10.59). 

Further, substituting into (10.59) expression (10.57) for a(x1,...,%,), taking 
into account that k(e) = | and that the determinant on the right-hand side of (10.57) 
coincides with the determinant of the transformation A, we obtain the following 
result: 


A(e1) A+++ A Alen) = |Al(e1 A+++ Aen). (10.60) 


This relationship gives the most invariant definition of the determinant of a linear 
transformation among all those that we have encountered. 

We obtained relationship (10.60) for an arbitrary basis e),...,@, of the space L, 
that is, for any n linearly independent vectors of the space. But it is also true for any 
n linearly dependent vectors a1, ...,@, of this space. Indeed, in this case, the vec- 
tors A(a1),..., A(a,) are clearly also linearly dependent, and by Property 10.15, 
both exterior products a; A---A@, and A(a,) A---A A(ap) are equal to zero. Thus 
for any 1 vectors a1, ..., @, of the space L and any linear transformation A:L— L, 
we have the relationship 


(ay) A+++ A A(an) = |Al(@] A+++ Aan). (10.61) 


In particular, if B:L— L is some other linear transformation, then formula 
(10.60) for the transformation BA :L— L gives the analogous equality 


(BA(e1) Art A BA(en)) =|BA|(e1A--- Aen). 
On the other hand, from the same formula we obtain that 


(B(A(e1)) A-+- A B(Alen))) = |BI(ACe1) A= A ACEn)) 
= |B\|A\(e1 A---A en). 


Hence it follows that |BA| = |B - |.A|. This is almost a “tautological” proof of 
Theorem 2.54 on the determinant of the product of square matrices. 

The arguments that we have presented acquire a more concrete character if L is 
an oriented Euclidean space. Then as the basis e;,...,@, in L we may choose an 
orthonormal and positively oriented basis. In this case, the basis (10.58) in A”(L) 
is uniquely defined, that is, it does not depend on the choice of basis e1,..., n. 
Indeed, if e\: Linh e, is another such basis in L, then as we know, there exists a linear 
transformation A :L— L such that e; = A(e;),i=1,...,n, and furthermore, the 
transformation A is orthogonal and proper. But then |.A| = 1, and formula (10.60) 
shows that e) A--- Ae), =e1 A+: Aen. 


10.5 Appendix* 377 


Example 10.31 Let us show how from the given considerations, we obtain a proof 
of the Cauchy—Binet formula, which was stated but not proved in Sect. 2.9. 

Let us recall that in that section, we considered the product of two matrices B 
and A, the first of type (m, 7), and the second of type (n, m), so that BA is a square 
matrix of order m. We are required to obtain an expression for the determinant | B A| 
in terms of the associated minors of the matrices B and A. Minors of the matrices B 
and A are said to be associated if they are of the same order, namely the minimum 
of n and m, and are located in the columns (of matrix B) and rows (of matrix A) 
of identical indices. The Cauchy—Binet formula asserts that the determinant |B A| is 
equal to 0 if n < m, and that | BA| is equal to the sum of the pairwise products over 
all the associated minors of order m if n > m. 

Since every matrix is the matrix of some linear transformation of vector spaces of 
suitable dimensions, we may formulate this problem as a question of the determinant 
of the product of linear transformations A:M— Land 8:L— M, where dimL =n 
and dimM = m. Here it is assumed that we have chosen a basis e€1,..., @ in the 
space M and a basis f,,..., f,, in the space L such that the transformations A and 
8 have matrices A and B respectively in these bases. Then 8A will be a linear 
transformation of the space M into itself with determinant |BA| =|BA|. 

Let us first prove that |B A| = 0 ifn < m. Since the image of the transformation, 
BA(M), is a subset of B(L) and dim B(L) < dim, it follows that in the case under 
consideration, we have the inequality 


dim(B.A(M)) < dim B(L) < dimL=n <m=dimM, 


from which it follows that the image of the transformation BA :M— M is not 
equal to the entire space M, that is, the transformation 8A is singular. This means 
that |BA| = 0, that is, |BA| = 0. 

Now let us consider the case n > m. Using Lemmas 10.16 and 10.17 from 
Sect. 10.3 with p =m, we obtain for the vectors of the basis e;,...,@, of the 
space M the relationship 


A™ (BA) (e] A+++ A@m) = A” (BA (A)(E1 A+++ A em) 
= A” (B)(A(e1) A---A Alem). (10.62) 


The vectors A(e1),..., A(@m) are contained in the space L of dimension n, and 
their coordinates in the basis f),..., f,,, being written in column form, form the 
matrix A of the transformation A :M— L. Let us now write the coordinates of 
the vectors A(e1),..., A(@m) in row form. We thereby obtain the transpose matrix 
A* of type (m,n). Applying formula (10.22) to the vectors A(e1),..., A(@m), we 
obtain the equality 


Ale) A-A Alem) = D> Mrgy (10.63) 
=> 
ICN? 


with the functions g, defined by formula (10.20). In the expression (10.63), ac- 
cording to our definition, M; is the minor of the matrix A* occupying columns 


378 10 The Exterior Product and Exterior Algebras 


ij, ..., 1m. Itis obvious that such a minor M7 of the matrix A* coincides with the mi- 
nor of the matrix A occupying rows with the same indices 7), ..., i. Thus we may 
assume that in the sum on the right-hand side of (10.63), M7 are the minors of order 
m of the matrix A corresponding to all possible ordered collections I = (1, ..., im) 
of indices of its rows. 

Relationships (10.62) and (10.63) together give the equality 


A™(BA)(e1 A-- A @m) = am(B)( - Mi01). (10.64) 
IcNm 


Let us denote by My; and Ny the associated minors of the matrices A and B. 
This means that the minor M7 occupies the rows of the matrix A with indices J = 
(i,,...,%m), and the minor N7 occupies the columns of the matrix B with the same 
indices. Let us consider the restriction of the linear transformation B : L > M to the 
subspace (f;,,..-, f,,). By the definition of the functions g;, we obtain that 


A™(B)(@1) = BF i,) A+ A BF i,,) = NI(E1 A+++ A em). 
From this, taking into account formula (10.64), follows the relationship 
A" (BA) (C1 A-+ Am) = an(B( x M91) 
IcNm 
= )> MrA"(B)(@)) 
IcN™ 
= ( > MiNi Jee A+++ A @m). 
IcNm 
On the other hand, by Lemma 10.17 and formula (10.60), we have 
A” (BA) (1 A+++ A €m) = BA(E1I) A+: A BA(Em) = |BA(e1 A+++ A em). 
The last two equalities give us the relationship 
|BA|= S> MINr, 
IcN™ 


which, taking into account the equality |BA| =|BA|, coincides with the Cauchy— 
Binet formula for the case n > m. 


Example 10.32 Let us derive the formula for the determinant of a square matrix A 
that generalizes the well-known formula for the expansion of the determinant along 
the jth column: 


|A| =a, jAtj +a2j Aaj + +++ + anjAnj; (10.65) 


10.5 Appendix* 379 


where Ajj; is the cofactor of the element a;;, that is, the number (—1)'+/ M; j» and 
Mj; is the minor obtained by deleting this element from the matrix A along with 
the entire row and column at whose intersection it is located. The generalization 
consists in the fact that now we shall write down an analogous expansion of the 
determinant not along a single column, but along several, thereby generalizing in a 
suitable way the notion of the cofactor. 


Let us consider a certain collection J ¢ N’”, where m is a natural number in 


the range 1 to n — 1. Let us denote by I the collection obtained from (1,...,7) 
ee 

by discarding all indices entering into I. Clearly, J ¢ N?~’”. Let us denote by 

|I| the sum of all indices entering into the collection J, that is, we shall set |Z| = 

i t-++ +i. 


Let A be an arbitrary square matrix of order n, and let J = (ij,..., im) and J = 
=> 
(j1,---, jm) be two collections of indices in N’”. For the minor M77 occupying 
the rows with indices i;,..., i and columns with indices j},..., jm, let us call the 
number 
Ary = (-1) Mz, (10.66) 


the cofactor. It is easy to see that the given definition is indeed a generalization of 
that given in Chap. 2 of the cofactor of a single element a;; for which m = | and the 
collections I = (i), J = (j) each consist of a single index. 


Theorem 10.33 (Laplace’s theorem) The determinant of a matrix A is equal to the 
sum of the products of all minors occupying any m given columns (or rows) by their 
cofactors: 


IAl= D2 MrygAry= > Mrs Ary, 
=> => 
JeN™ IeN™ 


where the number m can be arbitrarily chosen in the range | ton — 1. 


Remark 10.34 For m= 1 and m =n — 1, Laplace’s theorem gives formula (10.65) 
for the expansion of the determinant along a column and the analogous formula for 
expansion along a row. However, only in the general case is it possible to focus our 
attention on the symmetry between the minors of order m and those of order n — m. 


Proof of Theorem 10.33 Let us first of all note that since for the transpose matrix, 
its rows are converted into columns while the determinant is unchanged, it suffices 
to provide a proof for only one of the given equalities. For definiteness, let us prove 
the first—the formula for the expansion of the determinant |A| along m columns. 
Let us consider a vector space L of dimension n and an arbitrary basis e1,..., @n 
of L. Let A:L— L bea linear transformation having in this basis the matrix A. Let 
us apply to the vectors of this basis a permutation such that the first m positions are 
occupied by the vectors e;,,..., é;,,, the remaining n — m positions by the vectors 


Cinyjo+++> @i,- In the basis thus obtained, the determinant of the transformation A 


380 10 The Exterior Product and Exterior Algebras 


will again be equal to |A|, since the determinant of the matrix of a transformation 
“A does not depend on the choice of basis. Using formula (10.60), we obtain 


A(€i,) A+++ A ACG, ) A A(Cin gs) Ao A ACG, ) 
=|Al(Gi, A+ A Cin A Ging, N***A @i,) =|Al@, AQz). (10.67) 


Let us calculate the left-hand side of relationship (10.67), applying formula 
(10.22) to the two different groups of vectors. 
First, let us set aj = A(@;,),..., dm = A(E;,,). Then from (10.22), we obtain 


ACG) Av A ACC) = D> Mrygy, (10.68) 
J eNm 
where I = (ij,..., im), and J runs through all collections from the set We, 

Now let replace the number m by n — m in (10.22) and apply the formula thus 
obtained to the vectors aj = A(@j,,,,), ---, €n-m = A(@;,). As a result, we obtain 
the equality 

ACCing Ao AAG) = > Mapyoy, (10.69) 
Jenn 
where I = (im+1,+..-,4,), and J’ runs through all collections in the set Nem, 


Substituting the expressions (10.68) and (10.69) into the left-hand side of (10.67), 
we obtain the equality 


Yo 2 Mig Mp9 97 =/A\G1 A Gp. (10.70) 
JeNm yient-™ 


Let us calculate the exterior product g,; A g7 for p =m and q =n — m, mak- 
ing use of the multiplication table (10.55) that was obtained at the end of the 
previous section. In this case, it is obvious that the collection K obtained by the 


union of I and T is equal to (1,...,7), and we have only to calculate the number 
e(I, 1) = +1, which depends on whether the number of transpositions to get from 
(i1,---,4m,im+1,---,in) to K = (1,...,7) is even or odd. It is not difficult to see 


(using, for example, the same reasoning as in Sect. 2.6) that e(I, I) is equal to the 
number of pairs (7,7), where i € I andi € T, for which the indices i and7 are in 
reverse order (form an inversion), that is, i > 7. By definition, all indices less than i, 
appear in I, and consequently, they form an inversion with i,. This gives us i, — 1 
pairs. Further, all numbers less than iy and belonging to J form an inversion with 
index iz, that is, all numbers less than iz with the exception of i;, which belongs to 
I and not J. This gives iy — 2 pairs. 

Continuing in this way to the end, we obtain that the number of pairs (i, 7) form- 
ing an inversion is equal to (i; — 1) + (i2 — 2) +--- + Gm — m), that is, equal to 
[I] — uw, where w=1+---+m= 5m(m + 1). Consequently, we finally obtain the 
formula 9; A gy = (-1!l-Hgx, where K = (1,...,7). 


10.5 Appendix* 381 


The exterior product gy A gz’ is equal to zero for all J and J’, with the excep- 
tion only of the case that J’ = J, that is, the collections J and J’ are disjoint and 
complement each other. By what we have said above, gy A gj = (-1)Jl-4@ K: 
Thus from (10.70) we obtain the equality 


S° MryMqz(-1)'4' "og = Al(— Dog. (10.71) 


= 
JeNnm 


Multiplying both sides of equality (10.71) by the number (—1)!/'*“, taking into 
account the obvious identity (—1)?!/! = 1, we finally obtain 


dS MyM) = IAI, 


ad 
JEN” 


which, taking into account definition (10.66), gives us the required equality. 


Example 10.35 We began this section with Example 10.30, in which we investigated 
in detail the space A?(L) for p =n. Let us now consider the case p=n-—1. Asa 
result of the general relationship dim A? (L) = C”, we obtain that dim A”~!(L) =n. 

Having chosen an arbitrary basis e1,...,@, in the space L, we assign to every 
vector z € A”—!(L) the linear function f(x) on L defined by the condition 


ZAKX=f(x\(erA---Aen), XxX EL. 


For this, it is necessary to recall that z A x belongs to the one-dimensional space 
A”(L), and the vector e; A --- A en constitutes there a basis. The linearity of the 
function f(x) follows from the properties of the exterior product proved above. Let 
us verify that the linear transformation 


FAT 'W>L 


thus constructed is an isomorphism. Since dim An! (L) = dim L* = n, to show this, 
it suffices to verify that the kernel of the transformation ¥ is equal to (0). As we 
know, it is possible to select as the basis of the space A”~!(L) the vectors 


Ci, N@in Nv AG, 5, tk E{I,..., nf, 


uniquely up to a permutation of the collection (i;,...,i,—1); these are all the num- 
bers (1, ..., 7) except for one. This means that as the basis A"—!(L) one can choose 
the vectors 


Uj =p A+? Ae; AG ACi41°°* Aen, i=l,...,N. (10.72) 


It is clear that uu; Ae; =Oif i A j, andu; Ae; =+e, A--- Ae, foralli=1,...,n. 
Let us assume that z € A”~!(L) is a nonnull vector such that its associated linear 
function f(x) is equal to zero for every x € L. Let us set z= zyuy +--+ + ZyUn. 


382 10 The Exterior Product and Exterior Algebras 


Then from our assumption, it follows that z A x = 0 for all x €L, and in particular, 
for the vectors €1,..., @n. It is easy to see that from this follow the equalities z; = 0, 
.-5 Z7 =O and hence z= 0. 

The constructed isomorphism F : A”~!(L) — L* is a refinement of the following 
fact that we encountered earlier: the Pliicker coordinates of a hyperplane can be 
arbitrary numbers; in this dimension, the Pliicker relations do not yet appear. 

Let us now assume that the space L is an oriented Euclidean space. On the one 
hand, this determines a fixed basis (10.58) in A”(L) if e1,...,e, is an arbitrary 
positively oriented orthonormal basis of L, so that the isomorphism ¥ : A”~!(L) > 
L* constructed above is uniquely determined. On the other hand, for a Euclidean 
space, there is defined the standard isomorphism L* + L, which does not require the 
selection of any basis at all in L (see p. 214). Combining these two isomorphisms, 
we obtain the isomorphism 


a mie (eB 


which assigns to the element z € A"—!(L) the vector x € L such that 
ZA y=(X, y)(C1 A+++ A en) (10.73) 


for every vector y € L and for the positively oriented orthonormal basis e1,..., @n, 
where (x, y) denotes the inner product in the space L. 

Let us consider this isomorphism in greater detail. We saw earlier that the vectors 
u; determined by formula (10.72) form a basis of the space A"—1(L). To describe the 
constructed isomorphism, it suffices to determine which vector b € L corresponds 
to the vector aj A--- A a,_1, a; € L. We may suppose that the vectors a),...,@n—1 
are linearly independent, since otherwise, the vector a; A--- A a,—, would equal 0, 
and therefore to it would correspond the vector b = 0. Taking into account formula 
(10.73), this correspondence implies the equality 


(b, y)(@1A++*A@n) =A A+ Aan-1 AY, (10.74) 
satisfied by all y € L. Since the vector on the right-hand side of (10.74) is the 
null vector if y belongs to the subspace Lj = (aj,...,@,—1), we may assume that 
beL;. 


Now we must recall that we have an orientation and consider L and L, to be ori- 
ented (it is easy to ascertain that the orientation of the space L does not determine 
a natural orientation of the subspace L;, and so we must choose and fix the orienta- 
tion of L; separately). Then we may choose the basis e),..., @, in such a way that 
it is orthonormal and positively oriented and also such that the first n — 1 vectors 
€1,..-,@n—1 belong to the subspace Lj, and also define in it an orthonormal and 
positively oriented basis (it is always possible to attain this, possibly after replacing 
the vector e,, with its opposite). 

Since the vector b is contained in the one-dimensional subspace ey = (ey), it 
follows that b = Be,,. Using the previous arguments, we obtain that 


a, A+++ Aan-1 =V(Qq,...,An_1)en, 


10.5 Appendix* 383 


where v(a@),...,@,—1) is the oriented volume of the parallelepiped spanned by the 
vectors @1,...,@y,—1 (see the definition on p. 221). This observation determines the 
number f. 

Indeed, substituting the vector y = e, into (10.74) and taking into account the 
fact that the basis e),...,@, was chosen to be orthonormal and positively oriented 
(from which follows, in particular, the equality v(e; A--- A e;) = 1), we obtain the 
relationship 


Bv=v(Q,...,4n—1,€n) = V(Q],...,An—1). 


Thus the isomorphism § constructed above assigns to the vector aj A--- A @n—1 
the vector b = v(a,..., @n—1)€n, where e, is the unit vector on the line ee chosen 
with the sign making the basis e;,..., €, of the space L orthonormal and positively 
oriented. As is easily verified, this is equivalent to the requirement that the basis 
a\,...,An—1, en be positively oriented. 


The final result is contained in the following theorem. 


Theorem 10.36 For every oriented Euclidean space L, the isomorphism 
G2 A) Sk 


assigns to the vector aj A+++ A @n— the vector b € L, which is orthogonal to 
the vectors a\,...,@n,—, and whose length is equal to the unoriented volume 
V(a1,..-,@n—1), Or more precisely, 


b=V(aqQj,...,@n_1)@, (10.75) 


where e € L is a vector of unit length orthogonal to the vectors a,,...,@A,—1 and 
chosen in such a way that the basis a,,...,@n—1, e is positively oriented. 


The vector b determined by the relationship (10.75) is called the vector product 
of the vectors a,,...,@,—, and is denoted by [a),..., @,_;]. In the case n = 3, this 
definition gives us the vector product of two vectors [a1, a2] familiar from analytic 
geometry. 


Chapter 11 
Quadrics 


We have encountered a number of types of spaces consisting of points (affine, affine 
Euclidean, projective). For all of these spaces, an interesting and important question 
has been the study of guadrics contained in such spaces, that is, sets of points with 


coordinates (x;,...,X,) that in some coordinate system satisfy the single equation 
F(x1,...,Xn) =0, (11.1) 
where F is a second-degree polynomial in the variables x;,...,x,. Let us focus our 


attention on the fact that by the definition of a polynomial, it is possible in general 
for there to be present in equation (11.1) both first- and second-degree monomials 
as well as a constant term. 

For each of the spaces of the above-mentioned types, a trivial verification shows 
that the property of a set of points being a quadric does not depend on the choice of 
coordinate system. Or in other words, a nonsingular affine transformation, motion, 
or projective transformation (depending on the type of space under consideration) 
takes a quadric to a quadric. 


11.1 Quadrics in Projective Space 


By the definition given above, a quadric Q in the projective space P(L) is given by 
equation (11.1) in homogeneous coordinates. However, as we saw in Chap. 9, such 
an equation is satisfied by the homogeneous coordinates of a point of the projective 
space P(L) only if its left-hand side is homogeneous. 


Definition 11.1 A quadric in a projective space P(L) is a set Q consisting of points 
defined by equation (11.1), where F' is a homogeneous second-degree polynomial, 
that is, a quadratic form in the coordinates x9, x1,...,Xpn- 


LR. Shafarevich, A.O. Remizov, Linear Algebra and Geometry, 385 
DOI 10.1007/978-3-642-30994-6_11, © Springer-Verlag Berlin Heidelberg 2013 


386 11 Quadrics 


In Sect. 6.2, it was proved that is some coordinate system (that is, in some basis 
of the space L), equation (11.1) is reduced to canonical form 


Aoxd + Aix? +++: +Apx? =0, 


where all the coefficients 4; are nonzero. Here the number r < n is equal to the rank 
of the quadratic form F,, and it is the same for every system of coordinates in which 
the form F is reduced to canonical form. In the sequel, we shall assume that the 
quadratic form F is nonsingular, that is, that ry = n. We shall also call the associated 
quadric Q nonsingular. The canonical form of its equation can then be written as 
follows: 


aoxy +oxt +---+anx? =0, (11.2) 


where all the coefficients a; are nonzero. The general case differs from (11.2) only 
in the omission of terms containing x; with i =r+1,...,n. It is therefore easily 
reduced to the case of a nonsingular quadric. 

We have already encountered the concept of a tangent space to an arbitrary 
smooth hypersurface (in Chap. 7) or to a projective algebraic variety (in Chap. 9). 
Now we move on to a consideration of the notion of the tangent space to a quadric. 


Definition 11.2 If A is a point on the quadric Q given by equation (11.1), then the 
tangent space to Q at the point A € Q is defined as the projective space T4 Q given 
by equation 

n aF 
> a As =0. (11.3) 


i=o ~? 


The tangent space is an important general mathematical concept, and we shall 
now discuss it in the greatest possible generality. Within the framework of a course 
in algebra, it is natural to limit ourselves to the case in which F is a homogeneous 
polynomial of arbitrary degree k > 0. Then equation (11.1) defines in the space 
P(L) some hypersurface X, and if not all the partial derivatives gr 5x, (A) are equal to 
zero, then equation (11.3) gives the tangent hyperplane to the fnypersurface X at the 
point A. We see that in equation (11.3), on the left-hand side appears the differential 
d, F(x) (see Example 3.86 on p. 130), and since this notion was defined so as to 
be invariant with respect to the choice of coordinate system, the notion of tangent 
space is also independent of such a choice. The tangent space to the hypersurface X 
at the point A is denoted by T,X. 

In the sequel, we shall always assume that quadrics are viewed as lying in spaces 
over a field K of characteristic different from 2 (for example, for definiteness, we 
may assume that the field K is either R or C). If F(x) is a quadratic form, then by 
the assumptions we have made, we can write it in the form 


n 
F(x) = > Ajj XiXj, (11.4) 


i,j=0 


11.1 Quadrics in Projective Space 387 


where the coefficients satisfy a;; = a;;. In other words, F(x) = g(x, x), where 


n 


g(x, y= > ij Xi yj (11.5) 
i,j=0 


is a symmetric bilinear form (Theorem 6.6). If the point A corresponds to the vector 
a with coordinates (ao, @1,...,Q@,), then 


OF = 
a =2 > ajjAj, 
l . 
j=0 


and therefore, equation (11.3) takes the form 


n 


) QjjAjxXi = 0, 


Ggsu 


or equivalently, p(a, x) = 0. Thus in this case, the tangent hyperplane at the point 
A coincides with the orthogonal complement (a)+ to the vector a € L with respect 
to the bilinear form g(x, y). 

The definition of tangent space (11.3) loses sense if all derivatives GE (A) are 
equal to zero: 


OF . 

—(A)=0, i=0,1,...,n. (11.6) 

Ox; 
A point A of the hypersurface X given by equation (11.1) for which equalities (11.6) 
are satisfied is called a singular or critical point. If a hypersurface has no singular 
points, then it is said to be smooth. When the hypersurface X is a quadric, that is, 
the polynomial F is a quadratic form (11.4), then equations (11.6) assume the form 


n 
Saya =0, i=0,1,...,n. 
j=0 


Since the point A is in P(L), it follows that not all of its coordinates a; are equal to 
zero. Thus singular points of a quadric Q are the nonzero solutions of the system of 
equations 


n 


Yaijxj=0, 1=0,1,...,n. (11.7) 
j=0 


As was shown in Chap. 2, such solutions exist only if the determinant of the matrix 
(a;j) is equal to zero, and that is equivalent to saying that the quadric Q is singular. 
Thus a nonsingular quadric is the same thing as a smooth quadric. 

Let us consider the possible mutual relationships between a quadric Q and a line 
J in projective space P(L). First, let us show that either the line / has not more than 
two points in common with the quadric Q, or else it lies entirely in Q. 


388 11 Quadrics 


Indeed, if a line / is not contained entirely in Q, then one can choose a point 
A él, A¢ Q. Let the line / correspond to some plane L’ C L, that is, / = P(L’). If 
A = (a), then L’ = (a, b), where the vector b € L is not collinear with the vector a. 
In other words, the plane L’ consists of all vectors of the form xa + yb, where x and 
y range over all possible scalars. The points of intersection of the line / and plane 
Q are found from the equation F (xa + yb) = 0, that is, from the equation 


F(xa+ yb) = g(xa+ yb, xa+ yb) 
= F(a)x” + 2(a, b)xy + F(b)y? =0 (11.8) 


in the variables x, y. The vectors xa + yb with y = 0 give us a point A ¢ Q. As- 
suming, therefore, that y 4 0, we obtain t = x/y. Then (11.8) gives us a quadratic 
equation in the variable ¢: 


F(xa + yb) = y*(F(a)t? + 29(a, b)t + F(b)) =0. 


The condition A ¢ Q has the form F(a) 4 0. Consequently, the leading coeffi- 
cient of the quadratic trinomial F (a)t? +2¢(a, b)t + F(b) is nonzero, and therefore, 
the quadratic trinomial itself is not identically zero and cannot have more than two 
roots. 

Let us now consider the mutual arrangement of Q and / if the line / passes 
through the point A € Q. Then, as in the previous case, / corresponds to the so- 
lutions of the quadratic equation (11.8), in which F(a) = 0, since A € Q. Thus we 
obtain the equation 


F (xa + yb) = 29(a, b)xy + F(b)y* = y(29(a, b)x + F(b)y)=0. (11.9) 


One solution of equation (11.9) is obvious: y = 0. It precisely corresponds to the 
point A € Q. This solution is unique if and only if g(a, b) = 0, that is, if be T,Q. 
In the latter case, clearly 1! C T4Q, and one says that the line / is tangent to the 
quadric Q at the point A. 

Thus there are four possible cases of the relationship between a nonsingular 
quadric Q and a line /: 


(1) The line / has no points in common with the quadric Q. 

(2) The line / has precisely two distinct points in common with the quadric Q. 

(3) The line 7 has exactly one point A in common with the quadric Q, which is 
possible if and only if / Cc T4Q. 

(4) The line / lies entirely in Q. 


Of course, there also exist smooth hypersurfaces defined by equation (11.1) of ar- 

bitrary degree k > 1. For example, such a hypersurface is given by the equation 
k k ke 

coxy + c1xy + +++ + CyX_ = 0, where all the c; are nonzero. In the sequel, we shall 

consider only smooth hypersurfaces. For these, the left-hand side of equation (11.3) 

is a nonnull linear form on the vector space L, and this means that it determines a 

hyperplane in L and in P(L). 


11.1 Quadrics in Projective Space 389 


Let us verify that this hyperplane contains the point A. This means that if the 
point A corresponds to the vector a = (a, @1,..., @n), then 


If the degree of the homogeneous polynomial F is equal to k, then by Euler’s iden- 
tity (3.68), we have the equality 


"OF “OF 
> ax Aa = (> ies (A) =kF(A). 


i=0 i=0 


The value of F(A) is equal to zero, since the point A lies on the hypersurface X 
given by the equation F(A) = 

Now to switch to a more familiar situation, let us consider an affine subspace of 
P(L), given by the condition x9 4 0, and let us introduce in it the inhomogeneous 
coordinates 


yi =xi/xo, i=l,...,n. (11.10) 


Let us assume that the point A lies in this subset (that is, its coordinate ap is nonzero) 
and let us write equation (11.3) in coordinates y;. To do so, we must move from 
the variables x9, x1,...,%n to the variables yj,..., y, and rewrite equation (11.3) 
accordingly. Here we must set 


F (x0, X15 ++¢5%n) = Hh FO, 005 Ya) (11.11) 


where f(y1,..-, ¥n) iS a polynomial of degree k > 1, already not necessarily ho- 
mogeneous (in contrast to F). In accord with formula (11.10), let us denote by 
a\,..-, 4, the inhomogeneous coordinates of the point A, that is, 


aj=aj/ao, i=l,...,n. 


Using general rules for the calculation of partial derivatives, from the represen- 
tation (11.11), taking into account (11.10), we obtain the formulas 


OF k-1 k of OYL a 1 k 
—=k =kx 
dx0 ag dy Oxo oa ile dy \ x0 


390 11 Quadrics 


Now let us find the values of the derivatives calculated above of the function F at 
the point A with inhomogeneous coordinates a1, ..., a,. The value of F(A) is zero, 
since the point A lies in the hypersurface X and x9 4 0. By virtue of the represen- 
tation (11.11), we obtain from this that f (G1, +3 24 pon = = 0. For brevity, we shall em- 
ploy the notation f(A) = f(aj,...,dn) and a F(A) = ae (ay,..., dn). Thus from 
the two previous relationships, we obtain 


OF _ ke 1 
any) = 7% Dee. 
(11.12) 


oF -14 
a —(A), i=1,...,n. 
Yi 


On substituting expression (11.12) into (11.3), and taking into account (11.10), we 
obtain the equation 


-1y> of : af 

6 Emre o say) 
] 

=a 0 FEA ai) =0 


Canceling the nonzero common factor ah 'xo, we finally obtain 


n 


of 
> —(A)(i — ai) = 0. (11.13) 
dYi 


i=1 


This is precisely the equation of the tangent hyperplane T4 X in inhomogeneous 
coordinates. In analysis and geometry, it is written in the form (11.13) for a function 
f of a much more general class than that of polynomials. 

We may now return to the case in which the hypersurface X = Q is a nonsin- 
gular (and therefore smooth) quadric. Then for every point A € Q, equation (11.3) 
determines a hyperplane in L, that is, some line in the dual space L*, and therefore a 
point belonging to the space P(L*), which we shall denote by ®(A). Thus we define 
the mapping 


®:Q—> P(L"). (11.14) 


Our first task consists in determining what the set (Q) C P(L*) in fact is. For 
this, we express the quadratic form F(x) in the form F(x) = g(x,x), where the 
symmetric bilinear form g(x, y) has the form (11.5). By Theorem 6.3, we can write 
v(x, y) uniquely as v(x, y) = (x, A(y)), where “A :L— L* is some linear transfor- 
mation. From the definitions, it follows that here, the radical of the form g coincides 
with the kernel of the linear transformation .A. Since in the case of a nonsingular 
form F’, the radical ¢ is equal to (0), it follows that the kernel of 4 is also equal to 
(0). Since dimL = dimL*, we have by Theorem 3.68 that the linear transformation 


11.1 Quadrics in Projective Space 391 


A is an isomorphism, and there is thereby determined a projective transformation 
P(A): P(L) > PCL’). 

Let us now write down our mapping (11.14) in coordinates. If the quadratic form 
F(x) is written in the form (11.4), then 


aF ” 
aay Hs i=0,1,...,n. 


On the other hand, in some basis e@0, €1,..., @, of the space L, the bilinear form 
g(x, y) has the form (11.5), where the vectors x and y are given by x = x9@9 + 
+++ + X,e, and y = yo@go +--- + Ynen. From this, it follows that the matrix of the 
transformation A :L— L* in the basis eg, e1,..., @n of the space L and in the dual 
basis fo, f1,---, f, of the space L* is equal to (a;;). Therefore, to the quadratic 
form F(x) is associated the isomorphism ‘A :L— L*, and the mapping (11.14) 
that we constructed coincides with the restriction of the projective transformation 
P(A): P(L) > P(L*) to Q, that is, 6(Q) = P(A)(Q). 

From this arises an unexpected consequence: since the transformation P(.A) is a 
bijection, the transformation (11.14) is also a bijection. In other words, the tangent 
hyperplanes to the nonsingular quadric Q at distinct points A, B € Q are distinct. 
Thus we obtain the following result. 


Lemma 11.3 The same hyperplane cannot coincide with the tangent hyperplanes 
to a nonsingular quadric Q at two distinct points. 


This means that in writing a hyperplane of the space P(L) in the form T, Q, we 
may omit the point A. And in the case of a nonsingular quadric Q, it makes sense 
to say that the hyperplane is tangent to the quadric, and moreover, the point of 
tangency A € Q is uniquely determined. 

Let us now consider more concretely what the set ®(Q) looks like. We shall 
show that it is also a nonsingular quadric, that is, in some (and therefore in any) 
basis of the space L* determined by the equation q(x) = 0, where q is a nonsingular 
quadratic form. 

We saw above that there is an isomorphism 4: L + L* that bijectively maps Q 
to ®(Q). Therefore, there exists as well an inverse transformation A sLE SL, 
which is also an isomorphism. Then the condition y € ®(Q) is equivalent to 
A~!(y) € Q. Let us choose an arbitrary basis 


fie eee (11.15) 


~ 


in the space L*. The isomorphism A! : L* 5 L carries this basis to the basis 


A! (fo), A (fs -- Fn) (11.16) 


of the space L. Here obviously the coordinates of the vector A~!(y) in the basis 
(11.16) coincide with the coordinates of the vector y in the basis (11.15). As we 


392 11 Quadrics 


saw above, the condition A~!(y) € Q is equivalent to the relationship 
F (a0, @1,...,Qn) =0, (11.17) 


where F is a nonsingular quadratic form, and (ao, a1,...,@,) are the coordinates 
of the vector A~! (y) in some basis of the space L, for instance, in the basis (11.16). 
This means that the condition y € ®(Q) can be expressed by the same relationship 
(11.17). Thus we have proved the following statement. 


Theorem 11.4 /f Q is a nonsingular quadric in the space P(L), then the set of 
tangent hyperplanes to it forms a nonsingular quadric in the space P(L*). 


Repeating verbatim the arguments presented in Sect. 9.1, we may extend the 
duality principle formulated there. Namely, we can add to it some additional notions 
that are dual to each other that can be interchanged so that the general assertion 
formulated on p. 326 remains valid: 


nonsingular quadric in P(L) | nonsingular quadric in P(L*) 


point in a nonsingular quadric hyperplane tangent to a nonsingular quadric 


This (seemingly small) extension of the duality principle leads to completely 
unexpected results. By way of an example, we shall introduce two famous theorems 
that are duals of each other, that is, equivalent on the basis of the duality principle. 
Yet the second of them was published 150 years after the first. These theorems relate 
to quadrics in two-dimensional projective space, that is, in the projective plane. In 
this case, a quadric is called a conic.! 

In the sequel, we shall use the following terminology. Let Q be a nonsingular 
conic, and let Aj,..., Ag be six distinct points of Q. This ordered (that is, their 
order is significant) collection of points is called a hexagon inscribed in the conic Q. 
For two distinct points A and B of the projective plane, their projective cover (that 
is, the line passing through them) is denoted by AB (cf. the definition on p. 325). 
The six lines AyA2, A2A3,..., A5A6, AeA, are called the sides of the hexagon.” 
Here the following pairs of sides will be called opposite sides: Aj Az and A4As, 
A2A3 and AsA6, A3A4 and AoA. 


Theorem 11.5 (Pascal’s theorem) Pairs of opposite sides of an arbitrary hexagon 
inscribed in a nonsingular cone intersect in three collinear points. See Fig. 11.1. 


1A clarification of this term, that is, an explanation of what this has to do with a cone, will be given 
somewhat later. 


*Here we move away somewhat from the intuition of elementary geometry, where by a side we 
mean not the entire line passing through two points, but only the segment connecting them. This 
extended notion of a side is necessary if we wish to include the case of an arbitrary field K, for 
instance, K= C. 


11.1 Quadrics in Projective Space 393 


Fig. 11.1 Hexagon inscribed 
in a conic 


Before formulating the dual theorem to Pascal’s theorem, let us make a few com- 
ments. 

With the selection of a homogeneous system of coordinates (xo : x1 : x2) in the 
projective plane, the equation of the conic Q can be written in the form 


F(x9 1X1 2x2) = a\x6 + a2X0X1 + d3X0X2 + a4xt +a5x1x2 + agx5 =0. 


There are six coefficients on the right-hand side of this equation. If we have k points 
Aj,..., Ag, then the condition of their belonging to the conic Q reduces to the 
relationships 


POSS, Poicack (11.18) 


which yield a system consisting of k linear homogeneous equations in the six un- 
knowns aj,...,a6. We must find a nontrivial solution to this system. If we have 
k = 6, then this question falls under Corollary 2.13 as a special case (and this ex- 
plains our interest in hexagons inscribed in a conic). By this corollary, we have still 
to verify that the determinant of the system (11.18) for k = 6 is equal to zero. It is 
Pascal’s theorem that gives a geometric interpretation of this condition. 

It is not difficult to show that it gives necessary and sufficient conditions for 
six points A;,..., A6 to lie on some conic if we restrict ourselves, first of all, to 
nonsingular conics, and secondly, to such collections of six points that no three 
of them are collinear (this is proved in any sufficiently rigorous course in analytic 


geometry). 
Now let us formulate the dual theorem to Pascal’s theorem. Here six distinct 
lines L1,..., Le tangent to a conic Q will be called a hexagon circumscribed about 


the conic. Points L1 1 L2, L2N L3, L3N La, Lan Ls, L501 Lo, and L6 NL, are 
called the vertices of the hexagon. Here the following pairs of vertices will be called 
opposite: Lj Lz and Lan Ls, L2N L3 and L501 Lo, £30 L4 and L6N Lj. 


Theorem 11.6 (Brianchon’s theorem) The lines connecting opposite vertices of an 
arbitrary hexagon circumscribed about a nonsingular conic intersect at a common 
point. See Fig. 11.2. 


394 11 Quadrics 


Fig. 11.2 Hexagon 
circumscribed about a conic 


It is obvious that Brianchon’s theorem is obtained from Pascal’s theorem if we 
replace in it all the concepts by their duals according to the rules given above. Thus 
by virtue of the general duality principle, Brianchon’s theorem follows from Pascal’s 
theorem. Pascal’s theorem itself can be proved easily, but we will not present a 
proof, since its logic is connected with another area, namely algebraic geometry.° 
Here it is of interest to observe only that the duality principle makes it possible to 
obtain certain results from others that appear at first glance to be entirely unrelated. 
Indeed, Pascal proved his theorem in the seventeenth century (when he was 16 years 
old), while Brianchon proved his theorem in the nineteenth century, more than 150 
years later. And moreover, Brianchon used entirely different arguments (the general 
duality principle was not yet understood at the time). 


11.2 Quadrics in Complex Projective Space 


Let us now consider the projective space P(L), where L is a complex vector space, 
and as before, let us limit ourselves to the case of nonsingular quadrics. As we saw 
in Sect. 6.3 (formula (6.27)), a nonsingular quadratic form in a complex space has 
the canonical form ie + Sy +.--++.x?. This means that in some coordinate system, 
the equation of a nonsingular quadric can be written as 


x2tartetx2=0, (11.19) 


that is, every nonsingular quadric can be transformed into the quadric (11.19) by 
some projective transformation. In other words, in a complex projective space there 
exists (defined up to a projective transformation) only one nonsingular quadric 
(11.19). It is this quadric that we shall now investigate. 

In view of what we have said above, it suffices to consider any one arbitrary 
nonsingular quadric on the projective space P(L) of a given dimension. For example, 


3Such a proof can be found, for example, in the book Algebraic Curves, by Robert Walker 
(Springer, 1978). 


11.2 Quadrics in Complex Projective Space 395 


we may choose the quadric given by the equation F(x) = 0, where the matrix of the 
quadratic form F(x) has the form 


00+. 01 
00+. 10 
a <3 (11.20) 
O1-+- 00 
10+. 00 


A simple calculation shows that the determinant of the matrix (11.20) is equal to +1 
or —1, that is, it is nonzero. 

A fundamental topic that we shall study in this and the following sections is 
projective subspaces contained in a quadric. Let the quadric Q be given by the 
equation F(x) = 0, where x € L, and let a projective subspace have the form P(L’), 
where L’ is a subspace of the vector space L. Then the projective subspace P(L’) is 
contained in Q if and only if F(x) =0 for all vectors x € L’. 


Definition 11.7 A subspace L’ C L is said to be isotropic with respect to a quadratic 
form F if F(x) =0 for all vectors x € L’. 


Let g be the symmetric bilinear form associated with the quadratic form F’, ac- 
cording to Theorem 6.6. Then by virtue of (6.14), we see that g(x, y) = 0 for all 
vectors x, y € L’. Therefore, we shall also say that the subspace L’ C L is isotropic 
with respect to the bilinear form @. 

We have already encountered the simplest example of isotropic subspaces, in 
Sect. 7.7 in our study of pseudo-Euclidean spaces. There we encountered lightlike 
(also called isotropic) vectors on which a quadratic form (x*) defining a pseudo- 
Euclidean space becomes zero. Every nonnull lightlike vector e clearly determines 
a one-dimensional subspace (e). 

The basic technique that will be used in this and the following sections consists in 
how to reformulate our questions about subspaces contained in a quadric F(x) = 0 
in terms of a vector space L, a symmetric bilinear form g(x, y) defined on L and 
corresponding to the quadratic form F(x), and subspaces isotropic with respect to 
F and g. Then everything is determined almost trivially on the basis of the simplest 
properties of linear and bilinear forms. 


Theorem 11.8 The dimension of an arbitrary isotropic subspace U C L relative to 
an arbitrary nonsingular quadratic form F does not exceed half of dimL. 


Proof Let us consider (L’)+, the orthogonal complement of the subspace L’ C L 
with respect to the bilinear form g(u, v) associated with F(x). The quadratic form 
F(x) and bilinear form g(u, v) are nonsingular. Therefore, we have relationship 
(7.75), from which follows the equality dim(L’)+ = dimL — dimL’. 


396 11 Quadrics 


That the space L’ is isotropic means that L’ c (L’)+. From this we obtain the 


inequality 


dimL’ < dim(L’)* =dimL — dim’, 


from which it follows that dimL’ < 5 dim L, as asserted in the theorem. 


In the sequel, we shall limit our study of isotropic subspaces to those of the 
greatest possible dimension, namely 5 dimL when the number dimL is even and 
5(dim L— 1) when it is odd. The general case dimL’ < 5 dim L is easily reduced to 
this limiting case and is studied completely analogously. 

Let us consider some of the simplest cases, known from analytic geometry. 


Example 11.9 The simplest case of all is dimL = 2, and therefore, dim P(L) = 1. 
In coordinates (xo : x1), the quadratic form with matrix (11.20) has the form xo.x1. 
Clearly, the quadric xox; = 0 consists of two points (0: 1) and (1 : 0), corresponding 
to the vectors e; = (0, 1) and e2 = (1,0) in the plane L. Each of the two points 
determines an isotropic subspace L; = (e;). 


Example 11.10 Next in complexity is the case dimL = 3, and correspondingly, 
dim P(L) = 2. In this case, we are dealing with quadrics in the projective plane; 
their points determine one-dimensional isotropic subspaces in L that therefore form 
a continuous family. (If the equation of the quadric is F (xo, x1, x2) = 0, then in the 
space L, it determines a quadratic cone whose generatrices are isotropic subspaces.) 


Example 11.11 The following case corresponds to dimL = 4 and dim P(L) = 3. 
These are quadrics in three-dimensional projective space. For isotropic subspaces 
L’ CL, Theorem 11.8 gives dimL’ < 2. Isotropic subspaces of maximal dimension 
are obtained for dimL’ = 2, that is, dim P(L’) = 1. These are projective lines lying 
on the quadric. In coordinates (x9 : x1 : yo: y1), the quadratic form with matrix 
(11.20) gives the equation 


xoyo +x1)y1 =0. (11.21) 


We must find all two-dimensional isotropic subspaces L’ C L. Let a basis of 
the two-dimensional subspace L’ consist of vectors e = (ao, a1, bo, b,) and e' = 
(aj, a, 09, b,). Then the fact that L’ is isotropic is expressed, in view of formula 
(11.21), by the relationship 


(aou + ayv) (bou + bov) + (au +.a}v)(biu + iv) =0, (11.22) 


which is satisfied identically for all wu and v. The left-hand side of equation (11.22) 
represents a quadratic form in the variables u and v, which can be identically equal 
to zero only in the case that all its coefficients are equal to zero. Removing paren- 
theses in (11.22), we obtain 


agbo + ayb, = 0, agby + apbo +.ab} + ab = 0, 
(11.23) 
agbo + a,b}, = 0. 


11.2 Quadrics in Complex Projective Space 397 


The first equation from (11.23) means that the rows (ao, a;) and (b,, —bo) are 
proportional. Since they cannot both be equal to zero simultaneously (then all coor- 
dinates of the basis vector e would be equal to zero, which is impossible), it follows 
that one of them is the product of the other and some (uniquely determined) scalar £. 
For definiteness, let a9 = Bb1, a, = —Bbo (the case b) = Bao, bo = —fa, is con- 
sidered analogously). In just the same way, from the third equation of (11.23), we 
obtain that aj = yb}, a, = —y bo with some scalar y. Substituting the relationships 


ao = Bhi, a; =—fbo, ay = yb}, a, =—yby (11.24) 


into the second equation of (11.23), we obtain the equality (6 — y) (bob 1- 
bob) = 0. Therefore, either bjb; — bob = 0 or y = B. 

In the first case, from the equality bobi _ bob' = 0 it follows that the rows 
(bo, bo) and (bj, bi) are proportional, and we obtain the relationships b} = —abo 
and b| = —aby with some scalar w (the case by = —ab, and by = —ab, is consid- 
ered similarly). Let us assume that b; and bj, are not both equal to zero. Then a ¥ 0, 
and taking into account the relationships (11.24), we obtain 


agu + ayy = agu + agu = Bbiu + ybiv = —a(Bbou + ybov) = a(aju +ajv), 
bou + bov = —a~'(biu + div). 
In the second case, let us suppose that ag and a; are not both equal to zero. Then 
B 40, and taking into account relationship (11.24), we obtain 
agu + agv = agu + ayv = B(biu + bv), 
bou + byv = —p'(ayu +ajv). 


Thus with the assumptions made for an arbitrary vector subspace L’ with coordi- 
nates (x0, yo, 1, y1), we have either 


xo=ax1, you—a yy (11.25) 


or 


xo=By1,  yo=—B'm, (11.26) 


where @ and f are certain nonzero scalars. 

In order to consider the excluded cases, namely a = 0 (bj = bi = 0) and 8 =0 
(ay = a, = 0), let us introduce points (a: b) € P! and (c: d) € P', that is, pairs 
of numbers that are not simultaneously equal to zero, and let us consider them as 
defined up to multiplication by one and the same nonzero scalar. Then as is easily 
verified, a homogeneous representation of relationships (11.25) and (11.26) that also 
includes both previously excluded cases will have the form 


axo = bx, byp = —ay) (11.27) 


398 11 Quadrics 


and 
cxo = dy), dyp = —cx, (11.28) 


respectively. Indeed, equality (11.25) is obtained from (11.27) fora = 1 andb=a, 
while (11.26) is obtained from (11.28) for c = 1 andd= f. 

Relationships (11.27) give the isotropic plane L’ C L or the line P(L’) in P(L), 
which belongs to the quadric (11.21). It is determined by the point (a : b) € P!. Thus 
we obtain one family of lines. Similarly, relationships (11.28) determine a second 
family of lines. Together, they give all the lines contained in our quadric (called a 
hyperboloid of one sheet). These lines are called the rectilinear generatrices of the 
hyperboloid. 

On the basis of the formulas we have written down, it is easy to verify some 
properties known from analytic geometry: two distinct lines from one family of 
rectilinear generatrices do not intersect, while two lines from different families do 
intersect (at a single point). For every point of the hyperboloid, there is a line from 
each of the two families that passes through it. 


In the following section, we shall consider the general case of projective sub- 
spaces of maximum possible dimension on a nonsingular quadric of arbitrary di- 
mension in complex projective space. 


11.3 Isotropic Subspaces 


Let Q be a nonsingular quadric in a complex projective space P(L) given by the 
equation F(x) =0, where F(x) is a nonsingular quadratic form on the space L. In 
analogy to what we discussed in the previous section, we shall study m-dimensional 
subspaces L’ C L that are isotropic with respect to F, assuming that dimL = 2m if 
dim L is even, and dimL = 2m + 1 if dimL is odd. 

The special cases that we studied in the preceding section show that isotropic 
subspaces look different for different values of dimL. Thus for dimL = 3, we found 
one family of isotropic subspaces, continuously parameterized by the points of the 
quadric Q. For dimL = 2 or 4, we found two such families. This leads to the idea 
that the number of continuously parameterized families of isotropic subspaces on 
a quadric depends on the parity of the number dimL. As we shall now see, such is 
indeed the case. 

The cases of even and odd dimension will be treated separately. 


Case 1. Let us assume that dimL = 2m. Consequently, we are interested in isotropic 
subspaces M C L of dimension m. (This is the most interesting case, since here we 
shall see how the families of lines on a hyperbola of one sheet are generalized.) 


Theorem 11.12 For every m-dimensional isotropic subspace M C L, there exists 
another m-dimensional isotropic subspace N C L such that 


L=M@N. (11.29) 


11.3 Isotropic Subspaces 399 


Proof Our proof is by induction on the number m. For m = 0, the statement of the 
theorem is vacuously true. 

Let us assume now that m > 0, and let us consider an arbitrary nonnull vector 
e <M. Let g(x, y) be the symmetric bilinear form associated with the quadratic 
form F(x). Since the subspace M is isotropic, it follows that g(e, e) = 0. In view of 
the nonsingularity of F(x), the bilinear form g(x, y) is likewise nonsingular, and 
therefore, its radical is equal to (0). Then the linear function g(e,x) of a vector 
x € Lis not identically equal to zero (otherwise, the vector e would be in the radical 
of g(x, y), which is equal to (0)). 

Let f € L bea vector such that g(e, f) #0. Clearly, the vectors e, f are linearly 
independent. Let us consider the plane W = (e, f) and denote by g’ the restriction 
of the bilinear form g to W. In the basis e, f,, the matrix of the bilinear form gy’ has 
the form 


0 gle, f) 
'= » 9, f) FO. 
gef) of. f) 
It is obvious that |®’| = —y(e, f)* 40, and therefore, the bilinear form g’ is non- 
singular. 
Let us define the vector 
— AISEY-, 
2¢(e, f) 


Then as is easily verified, g(g, g) = 0, g(e, g) = v(e, f) £0, and the vectors e, g 
are linearly independent, that is, W = (e, g). In the basis e, g, the matrix of the 
bilinear form g’ has the form 


0 gle, g) 
Qo" = 
ve, g) 0 


As a result of the nondegeneracy of the bilinear form g’, we have by Theorem 6.9 
the decomposition 


L=Wolk), Li=W;, (11.30) 


where dimL; = 2m — 2. Let us set Mj = L} NM and show that M; is a subspace of 
dimension m — | isotropic with respect to the restriction of the bilinear form @ to 
Lite 

By construction, the subspace M; consists of the vectors x € M such that 
g(x,e) = 0 and g(x, g) = 0. But the first equality holds in general for all x € M, 
since e € M and M is isotropic with respect to g. Thus in the definition of the sub- 
space Mj, there remains only the second equality, which means that M; C M is 
determined by what is sent to zero by the linear function f(x) = g(x, g), which 
is not identically equal to zero (since f(e) = g(e, g) # 0). Therefore, dimM; = 
dimM—-1l=m-—1. 


400 11 Quadrics 


Thus M, is a subspace of L; of half the dimension of L;, defined by formula 
(11.30), and we can apply the induction hypothesis to it to obtain the decomposition 


Lj; =M,; @N,, (11.31) 


where Nj C L; is some other (m — 1)-dimensional isotropic subspace. 

Let us note that M = (e) @ M, and let us set N= (g) @ Nj. Since the subspace 
Nj is isotropic in Lj, the subspace N is isotropic in L, and taking into account that 
¢(g, g) = 0, we have for all vectors x € N; the equality g(g,x) = 0. Formulas 
(11.30) and (11.31) together give the decomposition 


L=(e) ® (g) BM; ON; =MON, 


which is what was to be proved. 


In the terminology of Theorem 11.12, an arbitrary vector z € N determines a 
linear function f(x) = g(z,x) on the vector space L, that is, an element of the 
dual space L*. The restriction of this function to the subspace M C L is obviously a 
linear function on M, that is, an element of the space M*. This defines the mapping 
F :N— M*. A trivial verification shows that F is a linear transformation. 

The decomposition (11.29) established by Theorem 11.12 has an interesting con- 
sequence. 


Lemma 11.13 The linear transformation F : N — M* constructed above is an iso- 
morphism. 


Proof Let us determine the kernel of the transformation ¥ : N— M*. Let us assume 
that ¥ (zo) = 0 for some Zo EN, that is, g(Zo, y) = 0 for all vectors y € M. But by 
Theorem 11.12, every vector x € L can be represented in the form x = y + z, where 
y €MandZeN. Thus 


p(Z0,X) = (Zo, y) + Y(Zo, Z) = G(Zo0, Z) = 0, 


since both vectors z and zo belong to the isotropic subspace N. From the nonsin- 
gularity of the bilinear form g, it then follows that z) = 0, that is, the kernel of ¥ 
consists of only the null vector. Since dimM = dim N, we have by Theorem 3.68 
that the linear transformation ¥ is an isomorphism. 


Let e1,..., € be some basis in M, and f;,..., f,, the dual basis in M*. The iso- 
morphism ¥ that we constructed creates a correspondence between this dual basis 
and a certain basis g,..., g,,, in the space N according to the formula F (g;) = f;. 
From decomposition (11.29) established in Theorem 11.12, it follows that vectors 
€1,..-,€m, £1,---»&m form a basis in L. In this basis, the bilinear form ¢ has the 
simplest possible matrix ®. Indeed, recalling the definitions of concepts that we 


have used, we obtain that 
0 &£E 
o=(¢ a) (11.32) 


11.3 Isotropic Subspaces 401 


where F and 0 are the identity and zero matrices of order m. For the corresponding 
quadratic form F and vector 


X= XC He HXmCm + Xm4181 +++ + X28 m> 


we obtain 


m 
P= > aaa (11.33) 
i=1 
Conversely, if in some basis €1,...,@2m of the vector space L, the bilinear form @ 
has matrix (11.32), then the space L can be represented in the form 


L=MON, M=(e,...,@m),N= (@m41,---,€2m), 


in accordance with Theorem 11.12. Let us recall that in our case (in a complex pro- 
jective space), all nonsingular bilinear forms are equivalent, and therefore, every 
nonsingular bilinear form g has matrix (11.32) in some basis. In particular, we see 
that in the 2m-dimensional space L, there exists an m-dimensional isotropic sub- 
space M. 

In order to generalize known results from analytic geometry for m = 2 to the case 
of arbitrary m (see Example 11.11), we shall provide several definitions that natu- 
rally generalize some concepts about Euclidean spaces familiar to us from Chap. 7. 


Definition 11.14 Let g(x, y) be a nonsingular symmetric bilinear form in the space 
L of arbitrary dimension. A linear transformation U : L — L is said to be orthogonal 
with respect to @ if 


y(U(x), U(y)) = v(x, y) (11.34) 


for all vectors x, y EL. 


This definition generalizes the notion of orthogonal transformation of a Eu- 
clidean space and Lorentz transformation of a pseudo-Euclidean space. Similarly, 
we Shall call a basis e1,...,@, of a space L orthonormal with respect to a bilinear 
form 9 if g(e;,e;) = 1 and g(e;,e;) = 0 for all i 4 j. Every orthogonal trans- 
formation takes an orthonormal basis into an orthonormal basis, and for any two 
orthonormal bases, there exists a unique orthogonal transformation taking the first 
of them to the second. The proofs of these assertions coincide word for word with 
the analogous assertions from Section 7.2, since there we nowhere used the positive 
definiteness of the bilinear form (x, y), but only its nonsingularity. 

The condition (11.34) can be expressed in matrix form. Let the bilinear form 
gy have matrix ® in some basis e1,...,@, of the space L. Then the transformation 
U:L— L will be orthogonal with respect to g if and only if its matrix U in this 
basis satisfies the relationship 


U*®U =@. (11.35) 


402 11 Quadrics 


This is proved just as was the analogous equality (7.18) for orthogonal transforma- 
tions of Euclidean spaces, and (7.18) is a special case of formula (11.35) for = E. 

It follows from formula (11.35) that |U*|-|®|-|U| = |®|, and taking into account 
the nonsingularity of the form gy (|®| 4 0), that |U*| - |U| = 1, that is, |U|? = 1. 
From this we finally obtain the equality |U| = +1, in which |U| can be replaced by 
|U|, since the determinant of a linear transformation does not depend on the choice 
of basis in the space, and consequently, coincides with the determinant of the matrix 
of this transformation. 

The equality |U| = +1 generalizes a well-known property of orthogonal trans- 
formations of a Euclidean space and provides justification for an analogous defini- 
tion. 


Definition 11.15 A linear transformation U : L — L orthogonal with respect to a 
symmetric bilinear form ¢ is said to be proper if |U| = 1 and improper if |U| = —1. 


It follows at once from Theorem 2.54 on the determinant of the product of ma- 
trices that proper and improper transformations multiply just like the numbers +1 
and —1. Similarly, the transformation U~! corresponds to the same type (of proper 
or improper orthogonal transformation) as U. 

The concepts that we have introduced can be applied to the theory of isotropic 
subspaces on the basis of the following result. 


Theorem 11.16 For any two m-dimensional isotropic subspaces M and M’ of a 2m- 
dimensional space L, there exists an orthogonal transformation U:L— L taking 
one of the subspaces to the other. 


Proof Since Theorem 11.12 can be applied to each of the subspaces M and M’, there 
exist m-dimensional isotropic subspaces N and N’ such that 


L=M@N=M AN. 


As we have noted above, from the decomposition L= M @N, it follows that in the 
space L, there exists a basis e1,...,@2m comprising the bases of the subspaces M 
and N in which the matrix of the bilinear form g is equal to (11.32). The second 
decomposition L = M’ @ N’ gives us a similar basis e}, ..., €5,,,- 

Let us define the transformation U by the action on the vectors of the basis 
€1,..-,€2m according to the formula U(e;) = e; for alli=1,...,2m. It is obvious 
that then the image U(M) is equal to M’. Furthermore, for any two vectors x = 
x1@, +--+ +X2m€2m and y = yje; +--- + yame2m, their images U(x) and U(y) 
have, in the basis e}. daacg Com? decompositions with the same coordinates: U(x) = 
xe) +++ +X2mes, and U(y) = ye} +--+ yome5,,- From this it follows that 


2m 


P(U(x), UY) = > xiymti = G(X, Y), 


i=1 


11.3 Isotropic Subspaces 403 


showing that U is an orthogonal transformation. 


Let us note that Theorem 11.16 does not assert the uniqueness of such a trans- 
formation U. In fact, such is not the case. Let us consider this question in more 
detail. Let U1 and Uz be the two orthogonal transformations that were the subject 
of Theorem 11.16. Applying to both sides of the equality U;(M) = U2(M) the trans- 
formation Us we obtain Ug(M) = M, where Up = 1 Uz is also an orthogonal 
transformation. Our further considerations are based on the following result. 


Lemma 11.17 Let M be an m-dimensional isotropic subspace of a 2m-dimensional 
space L, and let Ug : L — L be an orthogonal transformation taking M to itself. 
Then the transformation Ug is proper. 


Proof By assumption, M is an invariant subspace of the transformation Ug. This 
means that in an arbitrary basis of the space L whose first m vectors form a basis of 
M, the matrix of the transformation Uo has the block form 


Uo= € A (11.36) 


where A, B, C are square matrices of order m. 

The orthogonality of the transformation Up is expressed by the relationship 
(11.35), in which, as we have seen, with the selection of a suitable basis, we may 
consider that relationship (11.32) is satisfied. Setting in (11.35) in place of U the 
matrix (11.36), we obtain 


AX 0)\ (0 E\ (A B\_(0 E 
BY C* E 0 0 Cc) \E O} 
Multiplying the matrices on the left-hand side of this equality brings it into the form 


* 
(ori iG o) where D= C*B+ B*C. 


From this, we obtain in particular A*C = E, and this means that |A*|-|C| = 1. But 
in view of |A*| = |A|, from (11.36) we have |Up| = |A| - |C| = 1, as asserted. 


From Lemma 11.17 we deduce the following important corollary. 


Theorem 11.18 [fM and M are two m-dimensional isotropic subspaces of a 2m- 
dimensional space L, then the orthogonal transformations U:L— L taking one of 
these subspaces into the other are either all proper or all improper. 


Proof Let U, and Uz be two orthogonal transformations such that U;(M) = M’. It 
is clear that then u;! (M’) = M. Setting Up = ee U2, from the equality U;(M) = 
U2(M) we obtain that Uo(M) = M. By Lemma 11.17, |Uo| = 1, and from the rela- 
tionship Up = U;'Ud, it follows that |U1| = |U2|. 


404 11 Quadrics 


Theorem 11.18 determines in an obvious way a partition of the set of all m- 
dimensional isotropic subspaces M of a 2m-dimensional space L into two families 
QM, and Ntz. Namely, M and M’ belong to one family if an orthogonal transfor- 
mation U taking one of these subspaces into the other (which always exists, by 
Theorem 11.16) is proper (it follows from Theorem 11.18 that this definition does 
not depend on the choice of a specific transformation U). 

Now we can easily prove the following property, which was established in the 
previous section for m = 2, for any m. 


Theorem 11.19 Two m-dimensional isotropic subspaces M and M of a 2m- 
dimensional space L belong to one family IN; if and only if the dimension of their 
intersection MOM’ has the same parity as m. 


Proof Let us recall that natural numbers k and m have the same parity if k + m 
is even, or equivalently, if (—1)*+” = 1. Recalling now the definition of the parti- 
tion of the set of m-dimensional isotropic subspaces into families Jt; and tz and 
setting k = dim(MMM’), we may formulate the assertion of the theorem as follows: 


[Uj =(-D™, (11.37) 


where U is an arbitrary orthogonal transformation taking M to M’, that is, a trans- 
formation such that U(M) = M’. 

Let us begin the proof of relationship (11.37) with the case k = 0, that is, the case 
that MM’ = (0). Then in view of the equality dimM + dimM’ = dimL, the sum of 
subspaces M+ M’ =M@®@M coincides with the entire space L. This means that M’ 
exhibits all the properties of the isotropic subspace N constructed for the proof of 
Theorem 11.12. In particular, there exist bases e1,...,@, in Mand fy,..., f,, in 
M’ such that 


y(ei, f;)=1 fori=1,...,m, pei, f;)=0 fori # j. 


We shall determine the transformation U :L — L by the conditions U(e;) = f; 
and U(f;) =e; for alli=1,...,m. It is clear that U(M) = M’ and U(M’) =M. It 
is equally easy to see that in the basis e1,...,@m, f,,---, fm, the matrices of the 
transformation U and bilinear form ¢ coincide and have the form (11.32). Substi- 
tuting the matrix (11.32) in place of U and @ into formula (11.35), we see that it is 
converted to a true equality, that is, the transformation U is orthogonal. 

On the other hand, we have, therefore, the equality |U| = |®| = (—1)’”’. It is 
easy to convince oneself that || = (—1)” by transposing the rows of the matrix 
(11.32) with indices i and m+ i for all i =1,...,m. Here we shall carry out m 
transpositions and obtain the identity matrix of order 2m with determinant 1. As 
a result, we arrive at the equality |U| = (—1)”, that is, at relationship (11.37) for 
k=0. 

Now let us examine the case k > 0. Let us define the subspace M) = MOM’. Then 
k =dimM,. By Theorem 11.12, there exists an m-dimensional isotropic subspace 
N CL such that L=M@N. Let us choose in the subspace M a basis e1,..., @m 


11.3 Isotropic Subspaces 405 


such that its first k vectors e;,..., ex form a basis in M;. Then clearly, we have the 
decomposition 


M=M; @M2,_ where Mj = (e1,..., ex), Mo = (exg41,---, @m)- 


Above (see Lemma 11.13), we constructed the isomorphism F : N + M* and 
with its help, defined a basis g),..., g,, in the space N by formula F(g;) = f;, 
where f;,..., f, is a basis of the space M*, the dual basis to e;,..., @m. We obvi- 
ously have the decomposition 


N=N;@®No, where Ny = (g1,..-, 8%), No = (Bers +++ Bm)» 


where by our construction, F : Nj = Mj and F :N2 M5. 
Let us consider the linear transformation Uo : L — L defined by the formula 


Uo(ei) = g;, Uo(g;)=e; fori=l,...,k, 
Uo(e;) = e, Uo(g;)=g; fori=k+1,...,m. 
It is obvious that the transformation Uo is orthogonal, and also U2 = € and 


Uo(M1) = Ni, Uo(M2) = Mo, 
Uo(Ni1) = M1, Uo(N2) = No. 


(11.38) 


In the basis €1,...,@m, 21,---, &,» that we constructed in the space L, the matrix of 
the transformation Up has the block form 


Ue= 0 Em—k 0 0 
OTE, 0 0 0 | 
0 0 0 Em-—k 


where E; and E,,_, are the identity matrices of orders k and m — k. As is evident, 
Uo becomes the identity matrix after the transposition of its rows with indices i and 
m+i,i=1,...,k. Therefore, |Uo| = (—Dé. 

Let us prove that Ug(M’) MN M = (0). Since Ve = &, this is equivalent to 
M’ 1 Uo(M) = (0). Let us assume that x € M’M Uo(M). From the membership 
x € Uo(M) and decomposition M = M; @ Mp, taking into account (11.38), it fol- 
lows that x € Nj @ My, that is, 


x=2%1+ y2, where z] €Nj, yo € M2. (11.39) 
Thus for every vector y; € Mj, we have the equality 


P(X, V1) = P(Z1, ¥1) + P(V2, Y1)- (11.40) 


The left-hand side of equality (11.40) equals zero, since x € M’, y; €M; CM, 
and the subspace M’ is isotropic with respect to gy. The second term ¢(y>, y1) 


406 11 Quadrics 


on the right-hand side is equal to zero, since y; € Mj; C M, i = 1, 2, and the sub- 
space M is isotropic with respect to gy. Thus from relationship (11.40), it follows 
that ¢(z1, y,;) =0 for every vector y,; € Mj. 

This last conclusion means that for the isomorphism F : Ny Ss My, there cor- 
responds to the vector z; € Nj, a linear function on M; that is identically equal to 
zero. But that can be the case only if the vector z, itself is equal to 0. Thus in the 
decomposition (11.39), we have z; = 0, and therefore, the vector x = y> is con- 
tained in the subspace Mp». On the other hand, by virtue of the inclusions Mz C M 
and x € M’M Uo(M), taking into account the definition of the subspace Mj = MNM’, 
this vector is also contained in M,. As a result, we obtain that x € Mj M Mg, while 
by virtue of the decomposition M = M; @ Mz, this means that x = 0. 

Thus the subspaces Uo(M’) and M are included in the case k = 0 already consid- 
ered, and relationship (11.37) has been proved for them. By Theorem 11.16, there 
exists an orthogonal transformation U, :L— L such that Uy(Uo(M’)) = M. Then, 
as we have proved, |U,;| = (—1)”. The orthogonal transformation U = U Up takes 
the isotropic subspace M’ to M, and for it we have the relationship 


[U| = |Ui] -|Uol = (—D™(-1)* = (-)**™, 


which completes the proof of the theorem. 


We note two corollaries to Theorem 11.19. 


Corollary 11.20 The families IN, and Nz do not have an m-dimensional isotropic 
subspace in common. 


Proof Let us assume that two such m-dimensional isotropic subspaces M; € SJt; 
and M2 € tz are to be found such that M; = Mo. Then we clearly have the equality 
dim(M; 1 M2) =m, and by Theorem 11.19, M; and Mo cannot belong to different 
families St, and No. 


Corollary 11.21 [f two m-dimensional isotropic subspaces intersect in a subspace 
of dimension m — 1, then they belong to different families IN, and No. 


This follows from the fact that m and m — 1 have opposite parity. 


Case 2. Now we may proceed to an examination of the second case, in which the 
dimension of the space L is odd. It is considerably easier and can be reduced to the 
already considered case of even dimensionality. 

In order to retain the previous notation used in the even-dimensional case, let 
us denote by L the space of odd dimension 2m + 1 under consideration and let us 
embed it as a hyperplane in a space L of dimension 2m + 2. Let us denote by F a 
nonsingular quadratic form on L and by F its restriction to L. Our further reasoning 
will be based on the following fact. 


11.3 Isotropic Subspaces 407 


Lemma 11.22 For every nonsingular quadratic form F there exists a hyperplane 
L CL such that the quadratic form F is nonsingular. 


Proof Inacomplex projective space, all nonsingular quadratic forms are equivalent. 
And therefore, it suffices to prove the required assertion for any one form F’. For F, 
let us take the nonsingular form (11.33) that we encountered previously with m 


replaced by m + 1. Thus for a vector x € L with coordinates (x1, ...,2m+2), we 
have 
m+l1 
POS) tani (11.41) 


i=1 


Let us define a hyperplane L C L by the equation x, = x42. The coordinates in L are 
collections (x1, ..-,%m+1;Xm+2.Xm+3,--+;X2m+2), where the symbol ~ indicates 
the omission of the coordinate underneath it, and the quadratic form F in these 
coordinates takes the form 


m+l1 
F(x) =x7+ pa ree (11.42) 
1=2 


The matrix of the quadratic form (11.42) has the block form 


where @ is the matrix from formula (11.32). Since the determinant |®| is nonzero, 
it follows that the quadratic form (11.42) is nonsingular. 


We shall further investigate the m-dimensional subspaces M C L, isotropic with 
respect to the nonsingular quadratic form F, which is the restriction to the hyper- 
plane L of the nonsingular quadratic form F given in the surrounding space L. Since 
in the complex projective space L all nonsingular quadratic forms are equivalent, it 
follows that all our results will be valid for an arbitrary nonsingular quadratic form 
on L. 

Let us consider an arbitrary (7m + 1)-dimensional subspace M C L, isotropic with 
respect to F, and let us set M= MOL. It is obvious that the subspace M C L is 
isotropic with respect to F. Since in the space L, the hyperplane L is defined by a 
single linear equation, it follows that either M C L (and then M = M), or dimM = 
dimM— 1 =m. But the first case is impossible, since dimM < , dimL = 5(2m +1), 
and dimM = m + 1. Thus there remains the second case: dimM = m. Let us show 
that such an association with an (m + 1)-dimensional isotropic subspace M C L of 
an m-dimensional isotropic subspace M C L gives all the subspaces M of interest to 
us and in a certain sense, it is unique. 


408 11 Quadrics 


Theorem 11.23 For every m-dimensional subspace M C L isotropic with respect to 
F, there exists an (m + 1)-dimensional subspace M C L, isotropic with respect to 
F, such that M=MMNL. Moreover, in each of the families 9%, and Ny of subspaces 
isotropic with respect to F , there exists such an M, and it is unique. 


Proof Let us consider an arbitrary m-dimensional subspace M C L, isotropic with 
respect to F, and let us denote by M- its orthogonal complement with respect to the 
symmetric bilinear form @ associated with the quadratic form F in the surrounding 
space L. According to our previous notation, it should have been denoted by Mu, 
but we shall suppress the subscript, since the bilinear form @ will be always one and 
the same. From relationship (7.75), which is valid for a nondegenerate (with respect 
to the form ¢) space L and an arbitrary subspace of it (p. 267), it follows that 


dimM* = dimL — dimM =2m +2—m=m+2. 


Let us denote by @ the restriction of the bilinear form to M, and by F the 
restriction of the quadratic form F to M~. The forms @ and F are singular in general. 
By definition (p. 198), the radical of the bilinear form @ is equal to M ‘a (Mw) — 
M* AM. But since M is isotropic, it follows that MC uw, and therefore, the radical 


of the bilinear form @ coincides with M. By relationship (6.17) from Sect. 6.2, the 
rank of the bilinear form @ is equal to 


dimM~ — dim(M~)* = dimM~ — dimM = (n +2) —m=2, 


‘ aL : : 
and in the subspace M_, we may choose a basis e€1,..., @m+2 such that its last m 
vectors are contained in M (that is, in the radical @), and the restriction of g to 


(e1, e2) has matrix G =e 


Thus we have the decomposition Me = (e1, €2) ® M, where the restriction of the 
quadratic form F to (e;, é2) in our basis has the form x 1x2, and the restriction of F 
to M is identically equal to zero. 

Let us set Mj; = M@ (e;), i = 1,2. Then My and Mp are (m + 1)-dimensional 
subspaces in L. It follows from this construction that the M; are isotropic with respect 
to the bilinear form y. Here M; ~L =M, since on the one hand, from considerations 
of dimensionality, M; ¢ L, and on the other hand, MC M; and MC L. We have thus 
constructed two isotropic subspaces M; C L such that M; 0 L = M. That they belong 
to different families 9)t; and that in neither of these families are there any other 
subspaces with these properties, follows from Corollary 11.21. 


Thus we have shown that there exists a bijection between the set of m- 
dimensional isotropic subspaces M C L and each of the families 9%; of (m + 1)- 
dimensional isotropic subspaces M C L. This fact is expressed by saying that m- 
dimensional subspaces M C L isotropic with respect to a nonsingular quadratic form 
F form a single family. 


11.3 Isotropic Subspaces 409 


Of course, our partition of the set of isotropic subspaces into families is a matter 
of convention. It is mostly a tribute to tradition originating in the special cases con- 
sidered in analytic geometry. However, it is possible to give a more precise meaning 
to this partition by describing these subspaces in terms of Pliicker coordinates. 

In the previous chapter, we showed that k-dimensional subspaces M of an n- 
dimensional space L are in one-to-one correspondence with the points of some pro- 
jective algebraic variety G(k,n), called the Grassmannian. Suppose we are given 
some nonsingular quadratic form F on the space L. Let us denote by / (k,n) the 
subset of points of the Grassmannian G(k, n) that correspond to the k-dimensional 
isotropic subspaces. 

We shall state the following propositions without proof, since they relate not to 
linear algebra, but rather to algebraic geometry.* 


Proposition 11.24 The set I (k,n) is a projective algebraic variety. 


In other words, this proposition asserts that the property of a subspace being 
isotropic can be described by certain homogeneous relationships among its Plticker 
coordinates. 

A projective algebraic variety X is said to be irreducible if it cannot be rep- 
resented in the form of a union X = X; U X2, where X; are projective algebraic 
varieties different from X itself. 

Suppose the space L has odd dimension n = 2m + 1. 


Proposition 11.25 The set I (m, 2m + 1) is an irreducible projective algebraic va- 
riety. 


Now let the space L have even dimension n = 2m. We shall denote by J; (m, 2m) 
the subset of the projective algebraic variety [(m, 2m) whose points correspond to 
m-dimensional isotropic subspaces of the family 92;. Theorem 11.19 and its corol- 
laries show that 


I(m, 2m) = (m, 2m) U Ih(m, 2m), Iy(m, 2m) 0 h(m, 2m) = ©. 
This suggests the idea that the projective algebraic variety J (m, 2m) is reducible. 


Proposition 11.26 The sets I;(m, 2m), i = 1, 2, are irreducible projective algebraic 
varieties. 


Finally, we have the following assertion, which relates to the isotropism of a 
subspace whose dimension is less than maximal. 


Proposition 11.27 For all k < n/2, the projective algebraic variety I (k,n) is irre- 
ducible. 


4The reader can find them, for example, in the book Methods of Algebraic Geometry, by Hodge 
and Pedoe (Cambridge University Press, 1994). 


410 11 Quadrics 


11.4 Quadrics in a Real Projective Space 


Let us consider a projective space P(L), where L is a real vector space. As before, we 
shall restrict our attention to the case of nonsingular quadrics. As we saw in Sect. 6.3 
(formula (6.28)), a nonsingular quadratic form in a real space has the canonical form 


XQ t Xp He $x x24 x =. (11.43) 


Here the index of inertia r = s + 1 will be the same in every coordinate system in 
which the quadric is given by the canonical equation. 

If we multiply equation (11.43) by —1, we obviously do not change the quadric 
that it defines, and therefore, we may assume that s+ 1 >n—=s, that is, s > 
(n — 1)/2. Moreover, s <n, but in the case s =n, from equation (11.43) we ob- 
tain x9 = 0, x; =0, ..., x, = 0, and there is no such point in projective space. 

Thus, in contrast to a complex projective space, in a real projective space of given 
dimension n, there exists (up to a projective transformation) not one, but several 
nonsingular quadrics. However, there is only a finite number of them; they corre- 
spond to various values s, where we may assume that 


n—-1l 


<s<n-l. (11.44) 


To be sure, it is still necessary to prove that the quadrics corresponding to the various 
values of s are not projectively equivalent. But we shall consider this question (in 
an even more complex situation) in the next section. 

Thus the number of projectively inequivalent nonsingular quadrics in a real pro- 
jective space of dimension n is equal to the number of integers s satisfying inequal- 
ity (11.44). If n is odd, n = 2m + 1, then inequality (11.44) gives m < s < 2m, and 
the number of projectively inequivalent quadrics is equal to m+ 1. And if n is even, 
n = 2m, then there are m of them. In particular, for n = 2, all nonsingular quadrics 
in the projective plane are projectively equivalent. The most typical example is the 
circle x* + y* = 1, which is contained entirely in the affine part of x2 4 0 if the equa- 
tion is written as <5 + cI _ ca = 0 in homogeneous coordinates (xo : x1 : x2) (here 
inhomogeneous coordinates are expressed by the formulas x = xo/x2, y = x1 /X2). 

In three-dimensional projective space, there exist two types of projectively in- 
equivalent quadrics. In homogeneous coordinates (xp : x1 : X2 : x3), one of them is 
given by the equation x + a + i = x = 0. Here we always have x3 4 0, the 
quadric lies in the affine part, and it is given in inhomogeneous coordinates (x, y, Z) 
by the equation x? + y* + z* = 1, where x = xo/x3, y = x1/x3, Z = X2/x3. This 
quadric is a sphere. The second type is given by the equation XG + Xe - x5 - ae =0. 
This is a hyperboloid of one sheet. 

Their projective inequivalence can be seen at the very least from the fact that 
not a single real line lies on the first of them (the sphere), while on the second 
(hyperboloid of one sheet), there are two families each consisting of an infinite 
number of lines, called the rectilinear generatrices. 

Of course, we can embed a real space L into a complex space L©, and similarly, 
embed P(L) into P(LS). Therefore, everything that was said in Sect. 11.3 about 


11.4 Quadrics in a Real Projective Space 411 


isotropic subspaces is applicable in our case. However, although our quadric is real, 
the isotropic subspaces obtained in this way can turn out to be complex. The single 
exception is the case in which if the number v is odd, then s = (n — 1)/2, or ifn is 
even, then s =n/2. 

In the first instance, we may combine the coordinates into pairs (x;, X5+14;) and 
set Uj = Xj + X5414; and vj = xj — Xs414;. Then taking into account the equalities 


7 = ee = (xj + Xs414i) (Xj = Xs4i4i)s 
equation (11.43) can be written in the form 
UQVO + U{V] +--+ + Us vs = 0. (11.45) 


But this is the case of the quadric (11.33), which we considered in the previous 
section. It is easy to see that the reasoning used in Sect. 11.3 gives us a description 
of the real subspaces of a quadric. 

The case s = n/2 for even n also does not remove us from the realm of real sub- 
spaces and also leads to the case considered in the previous section. Moreover, if the 
equation of a quadric has the form (11.45) over an arbitrary field K of characteristic 
different from 2, then the reasoning from the previous section remains in force. 

In the general case, it is still possible to determine the dimensions of the spaces 
contained in a quadric. For this, we may make use of considerations already used in 
the proof of the law of inertia (Theorem 6.17 from Sect. 6.3). There we observed that 
the index of inertia (in the given case, the index of inertia of the quadratic form from 
(11.43), equal to s + 1) coincides with the maximal dimension of the subspaces L’ on 
which the restriction of the form is positive definite. (Let us note that this condition 
gives a geometric characteristic of the index of inertia, that is, it depends only on 
the set of solutions of the equation F (x) = 0, and not on the form F that defines it.) 

Indeed, let the quadric Q be given by the equation F(x) = 0. If the restric- 
tion F’ of the form F to the subspace L’ is positive definite, then it is clear 
that QM P(L’) = @. Thus if we are dealing with a projective space P(L), where 
dimL =n + 1, then in L there exists a subspace L of dimension s + 1 such that the 
restriction of the form F to it is positive definite. This means that Q N P(L) = @ 
(however, such a subspace L is also easily determined explicitly on the basis of 
equation (11.43)). If L’ C L is a subspace such that P(L’) C Q, then L’'NL = (0). 
Hence by Corollary 3.42, we obtain the inequality dimL + dimL’ < dimL=n+ 1. 
Consequently, dimL’ + s + 1 <n +1, and this means that dimL’ < n — s. Thus 
for the space P(L’) belonging to the quadric given by equation (11.43), we obtain 
dimL’ <n — s and therefore dim P(L’) <n —s — 1. 

On the other hand, it is easy to produce a subspace of dimension n — s — | actually 
belonging to the quadric (11.43). To this end, let us combine in pairs the unknowns 
appearing in equation (11.43) with different signs and let us equate the unknowns 
in one pair, for example x9 = x51, and so on. Since we have assumed that 5s + 1 > 
n—s, we may form n—s such pairs, and therefore, we obtain n — s linear equations. 
How many unknowns remain? Since we have combined 2(m — s) unknowns into 


412 11 Quadrics 


pairs, and in all there were n + 1 of them, there remain + 1 — 2(m — s) unknowns 
(it is possible that this number will be equal to zero). Thus we obtain 


(n—s)+tn+1—2(n—s)=n+1-(n-s) 


linear equations in coordinates in the space L. Since different unknowns occur in 

all these equations, these equations are linearly independent and determine in L a 

subspace L’ of dimension n — s. Then dim P(L’) =n — s — 1. Of course, since L’ is 

contained in Q, an arbitrary subspace P(L”) Cc P(L’) for L’ C L’ is also contained 

in Q. Thus in the quadric Q are contained subspaces of all dimensions r <n—s—1. 
We have therefore proved the following result. 


Theorem 11.28 /f a nonsingular quadric Q in a real projective space of dimension 
n is given by the equation F (xo, ..., Xn) = 0 and the index of inertia of the quadratic 
form F is equal to s + 1, then in Q are contained projective subspaces only of 
dimension r <n — s — 1, and for each such number r there can be found in Q a 
projective subspace of dimension r (when s + 1 >n—r, which is always possible 
to attain without changing the quadric Q, but changing only the quadratic form F 
that determines it to —F). 


We have already considered an example of a quadric in real three-dimensional 
projective space (n = 3). Let us note that in this space there are only two nonempty 
quadrics: for s = 1 and s = 2. 

For s = 2, equation (11.43) can be written in the form 


xe tx? + xd = x2, (11.46) 


As we have already said, for points of a real quadric, we have x3 4 0. This means 
that our quadric is entirely contained in this affine subset. Setting x = x9/x3, y= 
x1 /X3, Z = X2/x3, we shall write its equation in the form 


xy ee = 1, 


This is the familiar two-dimensional sphere S* in three-dimensional Euclidean 
space. Let us discover what lines lie on it. Of course, no real line can lie on a sphere, 
since every line has points that are arbitrarily distant from the center of the sphere, 
while for all points of the sphere, their distance from the center of the sphere is equal 
to 1. Therefore, we can be talking only about complex lines of the space P(L°). If 
in equation (11.46) we make the substitution x2 = iy, where i is the imaginary unit, 
we obtain the equation + a -y- a = 0, which in the new coordinates 


ug=xXot+y, v9 =x0—Y, uy =x, +3, Vy =X, — XB 


takes the form 


ugvpo + uyv, = 0. (11.47) 


11.4 Quadrics in a Real Projective Space 413 


Fig. 11.3. Hyperboloid of 
one sheet 


\ 
\ 


y/ 
RE 


WAV, 
\ AX) 


We studied such an equation in Sect. 11.2 (see Example 11.11). As an example 
of a line lying in the given quadric, we may take the line given by equations (11.25): 
up = Au, Vo = —A7!v, with arbitrary complex number 4 4 0 and arbitrary uw, v1. 
In general, such a line contains not a single real point of our quadric (that is, points 
corresponding to real values of the coordinates xo, ..., +3). Indeed, if the number A 
is not real, then the equality wo = Au, contradicts the fact that ug and uw, are real. 
The case up = uj = 0 would correspond to a point with coordinates x; = x3 = 0, 
for which xa + a = 0, that is, all x; are equal to zero. 

Thus on the sphere lies a set of complex lines containing not a single real point. 
If desired, all of them could be described by formulas (11.27) and (11.28) after 
changes in coordinates that we described earlier. However, of greater interest are 
the complex lines lying on the sphere and containing at least one real point. For 
each such line / containing a real point of the sphere P, the complex conjugate line 
1 (that is, consisting of points Q, where Q takes values on the line /) also lies on 
the sphere and contains the point P. But by Theorem 11.19, through every point 
P pass exactly two lines (even if complex). We see that through every point of the 
sphere there pass exactly two complex lines, which are the complex conjugates of 
each other. 

Finally, the case s = 1 leads to the equation 


xp +x? — x3 — x2 =0, (11.48) 
which after a change of coordinates 
uo =Xo+X1, vo = X0 — X1, Uy = X2 + X3, vl = X2— X3, 


also assumes the form (11.47). For this equation, we have described all the lines con- 
tained in a quadric by formulas (11.27) and (11.28), where clearly, real values must 
be assigned to the parameters a, b,c, d in these formulas. In this case, the obtained 
quadric is a hyperboloid of one sheet, and the lines are its rectilinear generatrices. 
See Fig. 11.3. 

Let us visualize what this surface looks like; that is, let us find a more familiar 
set that is homeomorphic to this surface. To this end, let us choose one line in each 
family of rectilinear generatrices: in the first, Jo; in the second, /;. As we saw in 
Sect. 9.4, every projective line is homeomorphic to the circle S!. On the other hand, 


414 11 Quadrics 


Fig. 11.4 A torus s! 


<> 


every line in the second family of generatrices is uniquely determined by its point of 
intersection with the line /p, and similarly, every line of the first family is determined 
by its point of intersection with the line /,. Finally, through every point of the surface 
pass exactly two lines: one from the first family of generatrices, and the other from 
the second. 

Thus is established a bijection between the points of a quadric given by equation 
(11.48) and pairs of points (x, y), where x € Jo, y € Jj, that is, the set Six si. 
It is easily ascertained that this bijection is a homeomorphism. The set S! x S! is 
called a torus. It is most simply represented as the surface obtained by rotating a 
circle about an axis lying in the same plane as the circle but not intersecting it. See 
Fig. 11.4. Such a surface looks like the surface of a bagel. As a result, we obtain that 
the quadric given by equation (11.48) in three-dimensional real projective space is 
homeomorphic to a torus. See Fig. 11.4. 


11.5 Quadrics in a Real Affine Space 


Now we proceed to the study of quadrics in a real affine space (V, L). Let us choose 


in this space a frame of reference (O; e1,..., e,). Then every point A € V is given 
by its coordinates (x1, ...,X,). A quadric is the set of all points A € V such that 
F(Xjis.66 +5 4n) = 9, (11.49) 


where F is some second-degree polynomial. There is now no reason to consider the 
polynomial F to be homogeneous (as was the case in a projective space). 

Collecting in F(x) terms of the second, first, and zeroth degrees, we shall write 
them in the form 


F(x)=W(x)+ f(x) +e, (11.50) 


where w(x) is a quadratic form, f(x) is a linear form, and c is a scalar. The quadrics 
F(x) = 0 thus obtained for n = 2 and 3 represent the curves and surfaces of order 
two studied in courses in analytic geometry. 

Let us note that according to our definition of a quadric as a set of points satisfy- 
ing relationship (11.49), we obtain even in the simplest cases, n = 2 and 3, sets that 
generally do not belong to curves or surfaces of degree two. The same “strange” 


11.5 Quadrics in a Real Affine Space 415 


examples show that dissimilar-looking second-degree polynomials can define one 
and the same quadric, that is, the solution set of equation (11.49). 

For example, in real three-dimensional space with coordinates x, y, z, the equa- 
tion x2 + y? +z? +c=0 has no solution in x, y,z if c > 0, and therefore for any 
c > 0, it defines the empty set. Another example is the equation x* + y? = 0, which 
is satisfied only with x = y = 0 but for all z, that is, this equation defines a line, 
namely the z-axis. But the same line (z-axis) is defined, for example, by the equa- 
tion ax* + by? = 0 with any numbers a and b of the same sign. 

Let us prove that if we exclude such “pathological” cases, then every quadric is 
defined by an equation that is unique up to a nonzero constant factor. Here it will be 
convenient to consider the empty set a special case of an affine subspace. 


Theorem 11.29 [fa quadric Q does not coincide with a set of points of any affine 
subspace and can be given by two different equations F\(x) =0 and F2(x) = 0, 
where the F; are second-degree polynomials, then Fy = XF\, where i is some 
nonzero real number. 


Proof Since by the given condition, the quadric Q is not empty, it must contain 
some point A. By Theorem 8.14, there exists another point B € Q such that the line 
I passing through A and B does not lie entirely in Q. 

Let us select in the affine space V, a frame of reference (O; e1,...,@n) in which 
the point O is equal to A and the vector e; is equal to AB. The line passing through 
the points A and B consists of points with coordinates (x1, 0, ..., 0) for all possible 
real values x;. Let us write down the equation F;(x) = 0, i = 1,2, defining our 
quadric after arranging terms in order of the degree of x;. As a result, we obtain the 
equations 


Fi(x1,-.-,%n) =aixt + filo, .--%n)x1 + Wilr2, -..,%n) =, i=1,2, 


where fj(x2,...,%,) and Wi(x2,...,%,) are inhomogeneous polynomials of first 
and second degree in the variables x2,...,x,. After defining f;(0,...,0) = f;(O) 
and yw; (0,...,0) = Wj (O), we may say that the relationship 


aixt + fi(O)x1 + Wi(O) =0 (11.51) 


holds for x; = 0 (point A) and for x; = 1 (point B), but does not hold identically 
for all real values x,. From this it follows that y;(O) = 0 and a; + Si (O) =0. This 
means that a; ~ 0, for otherwise, we would obtain that relationship (11.51) was 
satisfied for all x;. By multiplying the polynomial F; by a, | we may assume that 
qj=1. 

Let us denote by x the projection of the vector x onto the subspace (e2, ..., @n) 
parallel to the subspace (e1), that is, ¥ = (x2,...,X,). Then we may say that the 
two equations 


xp + fi@)xi+ wi) =0 and xf + fo@)x1 + Wo) =0, (11.52) 


416 11 Quadrics 


where fj (x) are first-degree polynomials and yw; (x) are second-degree polynomi- 
als of the vector x, have identical solutions. Furthermore, we know that they both 
have two solutions, x; = 0 and x; = 1, for ¥ = 0, that is, the discriminant of each 
quadratic trinomial 


pia)=x7 + fA@uatwk, i=1,2, 


with coefficients depending on the vector x, for ¥ = 0, is positive. 

The coefficients of the trinomial p;(x1) can be viewed as polynomials in the 
variables x2,..., Xn, that is, the coordinates of the vector ¥. Consequently, the dis- 
criminant of the trinomial p; (x1) is also a polynomial in the variables x2,..., Xn, 
and therefore, it depends on them continuously. From the definition of continuity, 
it follows that there exists a number ¢ > O such that the discriminant of each tri- 
nomial pj; (x1) is positive for all ¥ such that |x2| <é,..., |x,| < ¢. This condition 
can be written compactly in the form of the single inequality |x| < e, assuming that 
the space of vectors ¥ is somehow converted into a Euclidean space in which is 
defined the length of a vector |x|. For example, it can be defined by the relationship 
[x]? = x5 +++ +x. 

Thus the quadratic trinomials p; (x1) with leading coefficient | and coefficients 
fi) and y; (x), depending continuously on x, each have two roots for all |x| < e. 
But as is known from elementary algebra, such trinomials coincide. Therefore, 
fi@) = fo(®) and wi (x) = w2(x) for all |x| < ¢. Hence on the basis of the fol- 
lowing lemma, we obtain that these equalities are satisfied not only for |x| < e, but 
in general for all vectors Xx. 


Lemma 11.30 /f for some number ¢ > 0, the polynomials f (x) and g(x) coincide 
for all ¥ such that |x| < €, then they coincide identically for all x. 


Proof Let us represent each of the polynomials f(x) and g(x) as a sum of homo- 
geneous terms: 


N N 
(B= fA). g@=)\ gH). (11.53) 
k=0 k=0 
Let us set ¥ = ay, where || < € and the number a is in [0, 1]. Then the condition 
|x| < e is clearly satisfied, and this means that f(x) = g(x). Setting ¥ = ay in 
equality (11.53), we obtain 


N N 
Sook eH) = Yo ak gi (9). (11.54) 
k=0 k=0 


On the one hand, equality (11.54) holds for all a € [0, 1], of which there are in- 
finitely many. On the other hand, (11.54) represents an equality between two poly- 
nomials in the variable a. As is well known, polynomials of a single variable taking 
the same values for an infinite number of values of the variable coincide identi- 
cally, that is, they have the same coefficients. Therefore, we obtain the equalities 


11.5 Quadrics in a Real Affine Space 417 


SkQ) = gxQy) for all k =0,..., N and all y for which |y| < ¢. But since the poly- 
nomials f; and gx, are homogeneous, it follows that these equalities hold in general 
for all y. 

Indeed, every vector y can be represented in the form y = az with some scalar 
a and vector Z for which |z| < ¢. For example, it suffices to set a = (2/e)|y|. 
Consequently, we obtain f,(Z) = gx(z). But if we multiply both sides of this 
equality by o* and invoke the homogeneity of f; and g;, we obtain the equality 
Sk (@Z) = gx (eZ), that is, fey) = gx(y), which is what was to be proved. 


Let us note that we might have posed this same question about the uniqueness 
of the correspondence between quadrics and their defining equations with regard 
to quadrics in projective space. But in projective space, the polynomial defining a 
quadric is homogeneous, and this question can be resolved even more easily. So that 
we wouldn’t have to repeat ourselves, we have considered the question in the more 
complex situation. 

Let us now investigate a question that is considered already in a course in analytic 
geometry for spaces of dimension 2 and 3: into what simplest form can equation 
(11.49) be brought by a suitable choice of frame of reference in an affine space 
of arbitrary dimension n? This question is equivalent to the following: under what 
conditions can two quadrics be transformed into each other by a nonsingular affine 
transformation? 

We shall consider quadrics in an affine space (V,L) of dimension n, assuming 
that for smaller values of n, this problem has already been solved. In this regard, we 
shall not consider quadrics that are cylinders, that is, having the form 


Q=h"'(Q'), 


where (h, A) is an affine transformation of the space (V,L) into the affine space 
(V’,L’) of dimension m <n, and Q’ is some subset of V’. Let us ascertain that in 
this case, Q’ is a quadric in V’. 

Let the quadric Q in a coordinate system associated with some frame of reference 
of the affine space V be defined by the second-degree equation F'(x,,...,,) = 0. 
Let us choose in the m-dimensional affine space V’ some frame of reference 
(O’;e),...,e,). Then e',...,e/, is a basis in the vector space L’. In the defini- 
tion of a cylinder, one has the condition A(L) = L’. Let us denote by e),..., @m 
vectors e; € L such that A(e;) = é', i=1,...,m, and let us consider the subspace 
M = (é1,...,@m) that they span. By Corollary 3.31, there exists a subspace N CL 
such that L=M@N. Let O € V be an arbitrary point such that h(O) = O’. Then 
in the coordinate system associated with the frame of reference (O’; e}. vee Oin)s 
the projection of the space L onto M parallel to the subspace N and the associated 
projection h of the affine space V onto V’ are defined by the condition 


h(x1,...,Xn) = Cee OF 


where x; are the coordinates of (O';e/,...,e!,,), the associated frame of refer- 
ence. Then the fact that Q is a quadric means that its second-degree equation 


418 11 Quadrics 


F(x1,...,%,) = 0 is satisfied irrespective of the values that we have substituted 
for the variables x+1,...,Xn if the point with coordinates (x1,...,Xm) belongs 
to the set Q’. For example, we may set x41 =0,...,xX, = 0. Then the equation 
F(x},..-,x/,,0,...,0) =0 will be precisely the equation of the quadric Q’. 

The same reasoning shows that if a polynomial F' depends on fewer than n un- 
knowns, then the quadric Q defined by the equation F(x) = 0 is a cylinder. There- 
fore, in the sequel we shall consider only quadrics that are not cylinders. Our goal 
will be the classification of these quadrics using nonsingular affine transformations. 
Two quadrics that can be mapped one into the other by such a transformation are 
said to be affinely equivalent. 

First of all, let us consider the effect of a translation on the equation of a quadric. 
Let the equation of the quadric Q in coordinates associated with some frame of 
reference (O; e1,..., @,) have the form 


F(x) =w(x)+ f(x) +c=0, (11.55) 


where w(x) is a quadratic form, f(x) is a linear form, and c is a number. If Jq is a 
translation by the vector a € L, then the quadric %_(Q) is given by the equation 


w(x+a)+ f(x+a)+c=0. 


Let us consider how the equation of a quadric is transformed under these conditions. 
Let g(x, y) be the symmetric bilinear form associated with the quadratic form y (x), 
that is, w(x) = g(x, x). Then 


W(x +a) =9(% +a,x +a) =9(x,x) +29(x, a) + G(a, a) 
= W(x) + 29(x,a)+ Wa). 


As aresult, we obtain that after a translation J,: 


(a) The quadratic part w(x) does not change. 
(b) The linear part f(x) is substituted by f(x) + 2¢(x, a). 
(c) The constant term c is substituted by c+ f(a) + W(a). 


Using statement (b), then with the aid of a translation 7;, it is sometimes possible 
to eliminate the first-degree terms in the equation of a quadric. More precisely, this 
is possible if there exists a vector a € L such that 


f(x) = —29(x, a) (11.56) 


for an arbitrary x ¢ L. By Theorem 6.3, any bilinear form g(x, y) can be repre- 
sented in the form g(x, y) = (x, “A(y)) via some linear transformation A:L— L*. 
Then condition (11.56) can be written in the form (x, f) = —2(x, A(a)) for all 
x €L, that is, in the form f = —2.A(a) = A(—2a). This means that the condition 
(11.56) amounts to the linear function f € L* being contained in the image of the 
transformation A. 


11.5 Quadrics in a Real Affine Space 419 


First of all, let us investigate those quadrics for which condition (11.56) is satis- 
fied. In this case, there exists a frame of reference of the affine space in which the 
quadric can be represented by the equation 


F(x) =W(x)+c=0. (11.57) 


This equation exhibits a remarkable symmetry: it is invariant under a change of the 
vector x into —x. Let us investigate this further. 


Definition 11.31 Let V be an affine space and A a point of V. A central symmetry 
with respect to a point A is a mapping V — V that maps each point B ¢€ V to the 


— 
point B’ € V such that AB’ = Ae 


It is obvious that by this condition, the point B’, and therefore the mapping, 
is uniquely determined. A trivial verification shows that this mapping is an affine 
transformation and its linear part is equal to —&. 


Definition 11.32 A set Q C V is said to be centrally symmetric with respect to a 
point A ¢€ V if it is invariant under a central symmetry with respect to the point A, 
which in this case is called the center of the set Q. 


It follows from the definition that a point A on a quadric is a center if and only 
if the quadric is transformed into itself by the linear transformation —&, that is, 


xt> —x, where x = AX for every point X of this quadric. 


Theorem 11.33 [fa quadric does not coincide with an affine space, is not a cylin- 
der, and has a center, then the center is unique. 


Proof Let A and B be two distinct centers of the quadric Q. This means, by defini- 
tion, that for every point X € Q, there exists a point X’ € Q such that 


=> — 

AX =—AX’, (11.58) 
and for every point Y € Q, there exists a point Y’ € Q such that 

— —. 

BY =-—BY'’. (11.59) 
Let us apply relationship (11.58) to an arbitrary point X € Q, and relationship 


(11.59) to the associated point X’ = Y. Let us denote the point Y’ obtained as a 
result of these actions by X”’. It is obvious that 


==, => -—=>-—s»6su 2, 
XX" =XA+AB+ BX", (11.60) 


> > 

and from relationships (11.58) and (11.59), it follows that XA = AX’ and BX” = 

— —= > 

X’B. Substituting the last expressions into (11.60), we obtain that XX" = 2AB. In 
—> 

other words, this means that if the vector e is equal to 2AB, then the quadric Q is 


420 11 Quadrics 


Fig. 11.5 Similar triangles 


invariant under the translation %2; see Fig. 11.5. This assertion also follows from an 
examination of the similar triangles AB X’ and X X”X’ in Fig. 11.5. 

Since A ¥ B, the vector e is nonnull. Let us choose an arbitrary frame of ref- 
erence (O;@1,...,@,), where e; = e. Let us set L’ = (é9,...,e€,) and consider 
the affine space V’ = (L’,L’) and mapping h: V > V’, defined by the follow- 
ing conditions: h(O) = O, h(A) = O if OA =2e, and h(A;) = e; if OA; =e; 
(i =2,...,”). It is obvious that the mapping h is a projection and that the set Q is a 
cylinder. Since by our assumption, the quadric Q is not a cylinder, we have obtained 
a contradiction. 


Thus we obtain that by choosing a system of coordinates with the origin at the 
center of the quadric, one can define an arbitrary quadric satisfying the conditions 
of Theorem 11.33 by the equation 


W(X1,..-,%n) =c, (11.61) 


where yw is a nonsingular quadratic form (in the case of a singular form y, the 
quadric would be a cylinder). 

If c 4 0, then we may assume that c = 1 by multiplying both sides of equality 
(11.61) by c7!. Finally, we may execute a linear transformation that preserves the 
origin and brings the quadratic form yy into canonical form (6.22). As a result, the 
equation of the quadric takes the form 


XPto tea arse, (11.62) 


where c = 0 or 1, and the number r is the index of inertia of the quadratic form wy. 
If c=0 and r=0 orr =n, then it follows that xj = 0, ..., x, = 0, that is, 

the quadric consists of a single point, the origin, which contradicts the assumption 

made above that it does not coincide with some affine subspace. Likewise, for c = 


1 and r = 0, we obtain that —xi Stee ee = |, and this is impossible for real 
X1,...,Xn, So that the quadric consists of the empty set, which again contradicts our 
assumption. 


We have thus proved the following assertion. 


Theorem 11.34 Jf a quadric does not coincide with an affine subspace, is not a 
cylinder, and has a center, then in some coordinate system, it is defined by equation 
(11.62). Moreover, 0 <r <n, and ifc =0, thenr <n. 


In the case c = 0, it is possible, by multiplying the equation of a quadric by —1, 
to obtain that in (11.62), the number of positive terms is not less than the number of 


11.5 Quadrics in a Real Affine Space 421 


negative terms, that is, r >n —r, or equivalently, r > /2. In the sequel, we shall 
always assume that in the case c = 0, this condition is satisfied. 

Theorem 11.34 asserts that every quadric that is not an affine subspace or a cylin- 
der and that has a center can be transformed with the help of a suitable nonsingular 
affine transformation into a quadric given by equation (11.62). For c = 0 (and only 
in this case), the quadric (11.62) is a cone (with its vertex at the origin), that is, for 
every one of its points x, it also contains the entire line (x). It is possible to indicate 
another characteristic property of a quadric given by equation (11.62) for c = 0: it 
is not smooth, while in the case c = 1, the quadric is smooth. This follows at once 
from the definition of singular points (the equalities F = 0 and oe =0). 

Let us now consider quadrics without a center. Such a quadric Q is defined by 
the equation 


F(x) =w(x)+ f(x) +c=0, (11.63) 


where w(x) is a quadratic form, f(x) a linear form, c a scalar. As earlier, we shall 
write a symmetric bilinear form g(x, y) corresponding to a quadratic form w(x) 
as Y(x, y) = (x, A(y)), where “A:L— L* is a linear transformation. We have seen 
that for a quadric Q not to have a center is equivalent to the condition f ¢ A(L). 

Let us choose an arbitrary basis e1,..., @,—1 in the hyperplane L’ = ( f)“ defined 
in the space L by the linear homogeneous equation f(x) = 0, and let us extend this 
basis to a basis of the entire space L by means of a vector e, | L’ such that f(e,) = 1 
(here, of course, orthogonality is understood in the sense of being with respect to the 
bilinear form g(x, y)). In the obtained frame of reference (O; e),..., @n), equation 
(11.63) can be written in the form 


F(x) =W'(a1,..-,%n—1) tax? + x, +0=0, (11.64) 


where w’ is the restriction of the quadratic form y to the hyperplane L’. 
Let us now choose in L’ a new basis e\; sions 45 in which the quadratic form 
w’ has the canonical form 


W (X15 000s Xna1) = XP be be Pp me Py. (11.65) 


It is obvious that in this case, the coordinate origin O and the vector e, remain 
unchanged. If as a result, the quadratic form w’ turned out to depend on fewer than 
n — | variables, then the polynomial F in equation (11.63) would depend on fewer 
than n variables, and that, as we have seen, means that the quadric Q is a cylinder. 

Let us show that in formula (11.64), the number a@ is equal to 0. If a 4 0, then by 
virtue of the obvious relationship ax? txn+teo=a(x,+8 )? +c’, where B=1/(2a) 
and c! = c — B/2, we obtain that via the translation J, by the vector a = —Bey, 
equation (11.64) is transformed into 


F(x) = W'(x1,...,%n-1) taxy +c’ =0, 


where w’ has the form (11.65). But such an equation, as is easily seen, gives a 
quadric with a center. 

Thus assuming that the quadric Q is not a cylinder and does not have a center, 
we obtain that its equation has the form 


422 11 Quadrics 


xp te $x = xP, me x tn tco=0. 
Now let us perform a translation Zq by the vector a = —ce,. As a result, the co- 
ordinates x1,...,X,—1 are unchanged, while x, is changed to x, — c. In the new 


coordinates, the equation of the quadric assumes the form 
XP te bx axe x tay = 0. (11.66) 


By multiplying the equation of the quadric by —1 and changing the coordinate x, 
to —x,, we can obtain that the number of positive squares in equation (11.66) is 
not less than the number of negative squares, that is, r >n — r — 1, or equivalently, 
r>(n—1)/2. 

We have thereby obtained the following result. 


Theorem 11.35 Every quadric that is not an affine subspace or a cylinder and does 
not have a center can be given in some coordinate system by equation (11.66), where 
r is anumber satisfying the condition (n — 1)/2<r<n-—1. 


Thus by combining Theorems 11.34 and 11.35, we obtain the following result: 
Every quadric that is not an affine subspace or a cylinder can be given in some 
coordinate system by equation (11.62) if it doesn’t have a center and by equation 
(11.66) if it does have a center. We call these equations canonical. 

Theorems 11.34 and 11.35 do more than give the simplest form into which the 
equation of a quadric can be transformed through a suitable choice of coordinate 
system. Beyond that, it follows from these theorems that quadrics having a canonical 
form (11.62) or (11.66) can be affinely equivalent (that is, transformable into each 
other by a nonsingular affine transformation) only if their equations coincide. 

On the way to proving this assertion, we shall first establish that quadrics defined 
by equation (11.66) never have a center. Indeed, writing the equation of a quadric 
in the form (11.50), we may say that it has a center only if f € A(L). But a simple 
verification shows that this condition is not satisfied for quadrics defined by equation 


(11.66). Indeed, if in some basis e;,..., e, of the space L, the quadratic form w(x) 
is given as 
2 20.2 2 
5 an es i oe 
then on choosing the dual basis f;,..., f,, of the dual space L*, we obtain 


that the linear transformation A :L— L* associated with y by the relationship 
g(x, y) = (x, A(y)), in which g(x, y) is a symmetric bilinear form determined by 
the quadratic form yw, has the form A(e;) = f; fori=1,...,r, A(e;) = —f; for 
i=r+1,...,n—1,and A(e,) = 0, and the linear form x, coincides with f,,. Thus 
A(L) = \Fitxsesa Jd na) and f= f, ¢ A(L). 

We may now formulate the fundamental theorem on the classification of quadrics 
with respect to nonsingular affine transformations. 


Theorem 11.36 Any quadric that is not an affine subspace or cylinder can be rep- 
resented in some coordinate system by the canonical equation (11.62) or (11.66), 
where the number r satisfies the conditions indicated in Theorems 11.34 and 11.35 


11.5 Quadrics in a Real Affine Space 423 


respectively. And conversely, every pair of quadrics having the canonical equation 
(11.62) or (11.66) in some coordinate systems can be transformed into each other 
by a nonsingular affine transformation only if their canonical equations coincide. 


Proof Only the second part of the theorem remains to be proved. We have already 
seen that quadrics given by equations (11.62) and (11.66) cannot be mapped into 
each other by nonsingular affine transformations, since in the first case, the quadric 
has a center, while in the second case, it does not. Therefore, we may consider each 
case separately. 

Let us begin with the first case. Let there be given two quadrics Q; and Qo, 
given by different canonical equations of the form (11.62) (we note that the canon- 
ical equations in this case differ by the value c = 0 or | and index r), and where 
Q2 = g(Q}), with (g, A) a nonsingular affine transformation. By assumption, each 
quadric has a unique center, which in its chosen coordinate system coincides with 
the point O = (0,...,0). 

Let us write down the transformation g in the form (8.19): g = Sago, where 
go(O) = O. By assumption, Q2 = g(Q}), and this means that g(O) = O, that is, 
the vector a is equal to 0. In the equations of the quadrics, which we may write in 
the form F; (x) = W(x) +c; = 0, i = 1 and 2, it is clear that F;(0) = c;, and this 
means that the constants c; coincide (in the sequel, we shall denote them by c). Thus 
the equations of the quadrics Q; and Q> differ only in the quadratic part yj; (x). 

By Theorem 11.29, the transformation g takes the polynomial F\(x) — c into 
A(Fo(x) — c), where 4 is some nonzero real number. Consequently, the quadratic 
form y1(x) is transformed into Ayw2(x) by the linear transformation A. If we de- 
note the indices of inertia of the quadratic forms w;(x) by r;, then from the law of 
inertia, it follows that either r2 =r; (for A > 0) or r2 =n — 1, (for A < 0). In the 
case c = 0, we may assume that r; > n/2, and the equality r2 = n — r; is possible 
only for r2 =r. In the case c = 1, this same result follows from the fact that the 
transformation A takes the polynomial w(x) — 1 into A(yw(x) — 1). Comparing 
the constant terms, we obtain A = 1. 

In the case that the quadric has no center, we may repeat the same arguments. We 
again obtain that the quadratic form w(x) is carried into Aw2(x) by a nonsingular 
linear transformation. Since each form yw; (x) contains by assumption the term ne 
it follows that A = 1, and from the law of inertia, it follows that r2 = r; (for A > 0), 
or 72 =n — 1 — 1, (for A < 0). Since by assumption, r; > (n — 1)/2, the equality 
r2 =n — | —r; is possible only for r2 =r). 


Thus we see that in a real affine space of dimension n, there exists only a finite 
number of affinely inequivalent quadrics that are not affine subspaces or cylinders. 
Each of them is equivalent to a quadric that can be represented in the form of equa- 
tion (11.62) or equation (11.66). 

It is possible to compute the number of types of affinely inequivalent quadrics. 
Equation (11.62) for c = 1 gives n possibilities. The remaining cases depend on the 
parity of the number n. If n = 2m, then equation (11.62) for c = 0 gives m different 
types, and the same number is given by equation (11.66). Altogether, we obtain 
n-+ 2m = 2n different types in the case of even n. If n = 2m + 1, then equation 


424 11 Quadrics 


(11.62) for c = 0 gives m different types, and the same number is given by equation 
(11.66). Altogether in this case we obtain n + 2m — | = 2n — 2 different types. Thus 
in a real affine space of dimension n, the number of types of affinely inequivalent 
quadrics that are not affine subspaces or cylinders is equal to 2n if n is even, and to 
2n — 2 if n is odd. 


Remark 11.37 It is easy to see that the content of this section is reduced to the clas- 
sification of second-degree polynomials F'(x;,...,%,) up to a nonsingular affine 
transformation of the variables and multiplication by a nonzero scalar coefficient. 
The connection with the geometric object—the quadric—is established by Theo- 
rem 11.29. That we excluded from consideration the case of affine subspaces is 
related to the fact that we wished to emphasize the differences among the geometric 
objects that arise. 

The assumption that the quadric was not a cylinder was made exclusively to 
emphasize the inductive nature of the classification. The limitations that we intro- 
duced could have been done without. By repeating precisely the same arguments, 
we obtain that an arbitrary set in n-dimensional affine space given by equating a 
second-degree polynomial in n variables—the coordinates of a point—to zero is 
affinely equivalent to one of the sets defined by the following equations: 


xo pee xt x? ex? = 1, O<r<m<n, (11.67) 

Xptebap apy Xm =O, rE Smsn, (11.68) 
De ee ee Ae ee ee ee ee (11.69) 
1 r r+l m—1 mT” = 2° . . 


After this, it is easy to see that in the case of (11.67) for r = 0, the empty set is ob- 
tained, while in the case (11.68) for r = 0 or r = m, the result is an affine subspace. 
In the remaining cases, it is easy to find a line that intersects the given set in two 
distinct points and is not entirely contained in it. By virtue of Theorem 8.14, this 
means that such a set is not an affine subspace. 


In conclusion, let us say a bit about the topological properties of affine quadrics. 

If in equation (11.62), we have c = | and the index of inertia r is equal to 1, then 
this equation can be rewritten in the form XG =1+ a + +++-+.x2, from which it 
follows that a > 1, that is, xj > 1 or x; < —1. Clearly, it is impossible for a point 
of the quadric whose coordinate x; is greater than | to be continuously deformed 
into a point whose coordinate x, is less than or equal to —1 while remaining on the 
quadric (see the definition on p. xx). Therefore, a quadric in this case consists of two 
components, that is, it consists of two subsets such that no two points lying one in 
each of these subsets can be continuously deformed into each other while remaining 
on the quadric. It can be shown that each of these components is path connected (see 
the definition on p. xx), just as is every quadric given by equation (11.66). 

The simplest example of a quadric consisting of two path-connected components 
is a hyperbola in the plane; see Fig. 11.6. 


11.6 Quadrics in an Affine Euclidean Space 425 


Fig. 11.6 A hyperbola 


The topological property that we described above has a generalization to quadrics 
defined by equation (11.62) for c = 1 with smaller values of the index r, but still 
assuming that r > 1. Here we shall say a few words about them, without giving a 
rigorous formulation and also omitting proofs. 

For r = | we can find two points, (1,0, ...,0) and (—1,0,..., 0), that cannot be 
transformed into each other by a continuous motion along the quadric (they could 
be given as the sphere a = | in one-dimensional space). For an arbitrary value of 
r, the quadric contains the sphere 


2 2 
xXyte-+x7=1, Xr41 = 0, ee Xn = 0. 


One can prove that this sphere cannot be contracted to a single point by continu- 
ous motion along the surface of the quadric. But for every m <r and continuous 
mapping f of the sphere S’”’~! : y? Spee ye: = | into the quadric, the image of 
the sphere f(S’’—!) can be contracted to a point by continuous motion along the 
quadric (it should be clear to the reader what is meant by continuous motion of a set 
along a quadric, something that we have already encountered in the case r = 1). 


11.6 Quadrics in an Affine Euclidean Space 


It remains to us to consider nonsingular quadrics in an affine Euclidean space V. 
We shall, as before, exclude the cases in which the quadrics are affine subspaces 
or cylinders. The classification of such quadrics up to metric equivalence uses pre- 
cisely the same arguments as those used in Sect. 11.5. To some extent, the results 
of that section can be applied in our case, since motions are affine transformations. 
Therefore, we shall only cursorily recall the line of reasoning. 

Generalizing the statement of the problem, which goes back to analytic geometry 
(where cases dim V = 2 and 3 are considered), we shall say that two quadrics are 
metrically equivalent if they can be transformed into each other by some motion 
of the space V. This definition is a special case of metric equivalence of arbitrary 
metric spaces (see p. xxi), to which belong, as is easily verified, all quadrics in an 
affine Euclidean space. 

First of all, let us consider quadrics given by equations whose linear part can be 
annihilated by a translation. These are quadrics that have a center (which, as we 


426 11 Quadrics 


have seen, is unique). Choosing a coordinate origin (that is, a point O of the frame 


of reference (O; e1,..., @,)) in the center of the quadric, we bring its equation into 
the form 

W(xX],..-,%n) =c, 
where (x1,..., Xn) is a nonsingular quadratic form, c a number. If c 4 0, then by 


multiplying the equation by c7! 


is a cone. 
Using an orthogonal transformation, the quadratic form y can be brought into 
canonical form 


, we may assume that c = 1. For c = 0, the quadric 


ips. eX) = ee Pdamy be ga, 


where all the numbers 41,...,A, are nonzero, since by assumption, our quadric 
is nonsingular and is neither an affine subspace nor a cylinder, which means that 
the quadratic form w is nonsingular. Let us separate the positive numbers from the 


negative: suppose A1,...,A% > 0 and Ag4j,...,An < 0. By tradition going back 
to analytic geometry, we shall set A; = a fori =1,...,k and A; = —a;” for 
jJ=k+1,...,n, where all numbers aj, ..., a, are positive. , 


Thus every quadric having a center is metrically equivalent to a quadric with 


equation 
x1 Xk Xk+1 . Xn ‘ 
(=) t+ / ) ( ) ( ) =e (11.70) 
a ak ak+1 an 


where c = 0 or 1. For c = 0, multiplying equation (11.70) by —1, we may, as in the 
affine case, assume that k > n/2. 
Now let us consider the case that the quadric 


W(X1,---,Xn) + fO1,.--,Xn) +eo=0 


does not have a center, that is, f ¢ A(L), where A:L— L* is the linear transforma- 
tion associated with the quadratic form y by the relationship g(x, y) = (x, A(y)), 
in which g(x, y) is the symmetric bilinear form that gives the quadratic form yw. In 
this case, it is easy to verify that as in Sect. 11.5, we can find an orthonormal basis 
€1,-.-,@, Of the space L such that 


f(e1) =9, eae) f(€n-1) = 9, fen) =1, 


and in the coordinate system determined by the frame of reference (O; e1,..., en), 
the quadric is given by the equation 


dixt + Agx3 esech pe sae +x, +c=0. 
Through a translation by the vector —ce,, this equation can be brought into the form 


hig + Apx? + vee bh Ay 1x2_y +x, =0, 


11.6 Quadrics in an Affine Euclidean Space 427 


in which all the coefficients A; are nonzero, since the quadric is nonsingular and is 
not a cylinder. 

If Aq,...,A%¢ > O and Agyy,...,An—1 < 0, then by multiplying the equation of 
the quadric and the coordinate x, by —1 if necessary, we may assume that k > 
(n — 1)/2. Setting, as previously, A; = a fori=1,...,k anda; = —a;? for 
J=k+1,k+2,...,n—1, where aj,...,d, > 0, we bring the previous equation 


into the form 
Xk : Xk+1 ss Xn-1 ’ 
eo te +x, =0. (11.71) 
ak aAk+1 an-1 


Gl 
— I =F 
a| 

Thus every quadric in an affine Euclidean space is metrically equivalent to a 
quadric given by equation (11.70) (type I) or (11.71) (type II). Let us verify (under 
the given conditions and restriction on r) that two quadrics of the form (11.70) or 
of the form (11.71) are metrically equivalent only if all the numbers a),..., a, (for 
type I) and aj, ...,a,—1 (for type II) in their equations are the same. Here we may 
consider separately quadrics of type I and of type IJ, since they differ even from the 
viewpoint of affine equivalence. 

By Theorem 8.39, every motion of an affine Euclidean space is the composi- 
tion of a translation and an orthogonal transformation. As we saw in Sect. 11.5, a 
translation does not alter the quadratic part of the equation of a quadric. By Theo- 
rem 11.29, two quadrics are affinely equivalent only if the polynomials appearing in 
their equations differ by a constant factor. But for quadrics of type I for c = 1, this 
factor must be equal to 1. In the case of a quadric of type I for c = 0, multiplication 
by 2 > O means that all the numbers a; are multiplied by w~!/*. For a quadric of 
type II, this factor must also be equal to | in order to preserve the coefficient 1 in 
the linear term x,,. 

Thus we see that if we exclude quadrics of type I with constant term c = 0 
(a cone), then the quadratic parts of the equations must be quadratic forms equiva- 
lent with respect to orthogonal transformations. But the numbers A; are defined as 
the eigenvalues of the associated linearly symmetric transformation, and therefore, 
this also determines the numbers a;. In the case of a cone (quadric of type I for 
c =0), all the numbers 4; can be multiplied by a common factor that is a positive 
number (because of the assumptions made about r). This means that the numbers a; 
can be multiplied by an arbitrary positive common factor. 

Let us note that although our line of reasoning was precisely the same as in the 
case of affine equivalence, the result that we obtained was different. We obtained 
relative to affine equivalence only a finite number of different types of inequivalent 
quadrics, while with respect to metric equivalence, the number is infinite: they are 
determined not only by a finite number of values of the index r, but also by arbi- 
trary numbers a; (which in the case of a cone are defined up to multiplication by a 
common positive factor). This fact is presented in a course in analytic geometry; for 
example, an ellipse with equation 


428 11 Quadrics 


is defined by its semiaxes a and b, and if for two ellipses these are different, then 
the ellipses cannot be transformed into each other by a motion of the plane. 

For arbitrary n, quadrics having a canonical equation (11.70) with k =n and 
c = 1 are called ellipsoids. The equation of an ellipsoid can be rewritten in the form 


n Xi 2 
(2) =1, (11.72) 
k=1 a 


from which it follows that |x;/a;| < 1 and hence |x;| < a;. If the largest of these 
numbers d1,...,@, is denoted by a, then we obtain that |x;| < a. This property is 
expressed by saying that the ellipsoid is a bounded set. The interested reader can 
easily prove that among all quadrics, only ellipsoids have this property. 

If we renumber the coordinates in such a way that in the equation of the ellipsoid 
(11.72), the coefficients are aj > a2 > --- > ay, then we obtain 


(=) =(2) =) 

—) S(—]} S\(7—])]> 

a| qj an 

whence for every point x = (x1, ..., X,) lying on the ellipsoid, we have the inequal- 
ity a, < |x| < a,. This means that the distance from the center O of the ellipsoid 
to the point x is not greater than to the point A = (a1, 0,..., 0) and not less than to 


the point B = (0,..., 0, a,). These two points, or more precisely, the segments OA 
and OB, are called the semimajor and semiminor axes of the ellipsoid. 


11.7 Quadrics in the Real Plane* 


In this section, we shall not be proving any new facts. Rather, our goal is to estab- 
lish a connection between results obtained earlier with facts familiar from analytic 
geometry, in particular, the interpretation of quadrics in the real plane as conic sec- 
tions, which was known already to the ancient Greeks. 

Let us begin by considering the simplest example, in which it is possible to see 
the difference between the affine and projective classifications of quadrics, that is, 
quadrics in the real affine and real projective planes. But for this, we must first refine 
(or recall) the statement of the problem. 

By the definition from Sect. 9.1, we may represent a projective space of arbitrary 
dimension n in the form P(L), where L is a vector space of dimension n + 1. An 
affine space of the same dimension n can be considered the affine part of P(L), 
determined by the condition g 4 0, where ¢ is some nonnull linear function on L. It 
can also be identified with the set Wy, defined by the condition g(x) = 1. This set is 
an affine subspace of L (we may view L as its own space of vectors). In the sequel, 
we shall make use of precisely this construction of an affine space. 

A quadric Q in a projective space P(L) is given by an equation F(x) = 0, where 
F is a homogeneous second-degree polynomial. In the space L, the collection of all 
vectors for which F(x) = 0 forms a cone K. Let us recall that a cone is a set K 


11.7 Quadrics in the Real Plane* 429 


such that for every vector x € K, the entire line (x) containing x is also contained 
in K. A cone associated with a quadric is called a quadratic cone. From this point 
of view, the projective classification of quadrics coincides with the classification of 
quadratic cones with respect to nonsingular linear transformations. 

Thus an affine quadric Q can be represented in the form Wy, 1 K using the 
previously given notation Wy and K. Quadrics Q; C Wg, and Q2 C Wg, are 
by definition affinely equivalent if there exists a nonsingular affine transformation 
Wo, > Wo, mapping Q| to Qo. This means that we have a nonsingular linear trans- 
formation A of the vector space L for which 


A(Wo,) = Wo, and A(Wo, 1 Ki) = Wo, 9 Ko, 


where K, and K2 are quadratic cones associated with the quadrics Q; and Qo. 

First of all, let us examine how the mapping -A acts on the set Wg. To this end, 
let us recall that in the space L* of linear functions on L there are defined dual 
transformations A* for which 


A*(p)(x) = g(A(x)) 


for all vectors x € L and g € L*. In other words, this means that if A*(y) = w, 
then the linear function w(x) is equal to g(.A(x)). Since the transformation A is 
nonsingular, the dual transformation A* is also nonsingular, and therefore, there 
exists an inverse transformation (A*)~!. By definition, (A*)—!(9)(A(x)) =1lif 
g(x) = 1, that is, A takes Wg into the set Weaeylg): 

Since in previous sections, we considered only nonsingular projective quadrics, it 
is natural to set corresponding restrictions in the affine case as well. To this end, we 
shall use, as earlier, the representation of affine quadrics in the form O = Wy K. 
A quadratic cone K determines some projection to the quadric Q. It is easy to ex- 
press this correspondence in coordinates. If we choose in L a system of coordinates 
(xo, X1,---,Xn), then in W,,. are defined inhomogeneous coordinates y,..., yn by 
the formula y; = x; /xo. If the quadric Q is given by the second-degree equation 


SO1.-++5 Yn) = 9, 


then the quadric Q (and cone K) is given by the equation 


2 xX) Xn 
F(x0,%1,.--,Xn)=0, where F=xof( —,...,— }. 
x0 xo 


Thus the projective quadric Q is uniquely defined by the affine quadric Q. 


Definition 11.38 An affine quadric Q is said to be nonsingular if the associated 
projective quadric Q is nonsingular. 


In a space of arbitrary dimension n, all quadrics with canonical equations 
(11.67)-(11.69) for m <n are singular. Furthermore, a quadric of type (11.68) is 


430 11 Quadrics 


singular as well for m = n. Both these assertions can be verified directly from the 
definitions; we have only to designate the coordinates x1,...,X, by y1,.-., Yn, in- 
troduce homogeneous coordinates x9 : x1 : +--+: Xy, setting yj; = x;/xo, and multiply 
all the equations by ne It is very easy to write down the matrix of a quadratic form 
F (x0, %1,.--5Xn)- 

In particular, for n = 2, we obtain three equations: 


yity=1, ye-y=1, yi ty=0. (11.73) 


From the results of Sect. 11.5, it follows that for n = 2, every nonsingular affine 
quadric is affinely equivalent to a quadric of one (and only one) of these three types. 
The corresponding quadrics are called ellipses, hyperbolas, and parabolas. 

On the other hand, in Sect. 11.4, we saw that all nonsingular projective quadrics 
are projectively equivalent. This result can serve as a graphic representation of affine 
quadrics. As we have seen, every affine quadric can be represented in the form 
OQ = WoNK, where K is some quadratic cone. It is affinely equivalent to the quadric 


A(Wy N K) => Wael) N A(K), 


where A is an arbitrary nonsingular linear transformation of the space L. 

Here arises the specific nature of the case n = 2 (dimL = 3). By what has been 
proved earlier, every cone K associated with a nonsingular quadric can be mapped 
to every other such cone by a nonsingular transformation A. In particular, we may 
assume that A(K) = Ko, where the cone Ko is given in some coordinate system 
X0,X1,xX2 of the space L by the equation x + x5 — i. This cone is obtained by 
the rotation of one of its generatrices, that is, a line lying entirely on the cone (for 
example, the line x; = xo, x2 = 0) about the axis xo (that is, the line xj = x2 = 0). In 
the cone Ko that we have chosen, the angle between the generatrix and the axis xo 
is equal to 2/4. In other words, this means that each pole of the cone Ko is obtained 
by a rotation of the sides of an isosceles right triangle around its bisector. 

Setting (A*)~!(y) = W, we obtain that an arbitrary nonsingular affine quadric 
is affinely equivalent to the quadric Wy M Ko. Here Wy is an arbitrary plane in the 
space L not passing through the vertex of the cone Ko, that is, through the point 
O = (0, 0, 0). Thus every nonsingular affine quadric is affinely equivalent to a pla- 
nar section of a right circular cone. This explains the terminology conic used for 
quadrics in the plane. 

It is well known from analytic geometry how the three conics that we have found 
(ellipses, hyperbolas, and parabolas) are obtained from a single (from the point of 
view of projective classification) curve. If we begin with equations (11.73), then the 
difference in the three types is revealed by writing these equations in homogeneous 
coordinates. Setting y) = x1/xo and y2 = x2/xo, we obtain the equations 


2 e) 2 2 2 2 2 
Xp +X =X, XxX} —X7 =X, xX} — Xox2 = 0. (11.74) 


The differences among these equations can be found in the different natures of the 
sets of intersection with the infinite line /,, given by the equation xp = 0. For an 


11.7 Quadrics in the Real Plane* 431 


a cael Xy = 0 
6.9 UC 


ellipse hyperbola parabola 


Fig. 11.7 Intersection of a conic with an infinite line 


ellipse, this set is empty; for a hyperbola, it consists of two points, (0: 1:1) and 
(0: 1: —1), and for a parabola, it consists of the single point (0: 0: 1) (substitution 
into equation (11.73) shows that the line Jo is tangent to the parabola at the point of 
intersection); see Fig. 11.7. 

We saw in Sect. 9.2 that an affine transformation coincides with a projective 
transformation that preserves the line /,,. Therefore, the type of set O Noo (empty 
set, two points, one point) should be the same for affinely equivalent quadrics Q. In 
our case, the actual content of what we proved in Sect. 11.4 is that the type of set 
QN Io, determines the quadric Q up to affine equivalence. 

But if we begin with the representation of a conic as the intersection of the cone 
Ko with the plane W,,, then different types appear due to a different disposition of 
the plane W,, with respect to the cone Kg. Let us recall that the vertex O of the cone 
Ko partitions it into two poles. If the equation of the cone has the form a + ae = a 
then each pole is determined by the sign of xo. 

Let us denote by Ly the plane parallel to Wy and passing through the point O. 
This plane is given by the equation yw = 0. If Ly has no points of intersection with 
the cone Ko other than O, then W,, intersects one of its poles (for example, the one 
within which lie the point of intersection Wy and the axis x9). In this case, the conic 
Wy O Ko lies within one pole and is an ellipse. 

For example, in the special case in which the plane Wy, is orthogonal to the axis 
Xo, we obtain a circle. If we move the plane Wy (for example, decrease its angle with 
the axis xq), then in its intersection with the cone Ko, an ellipse is obtained whose 
eccentricity increases as the angle is decreased; see Fig. 11.8(a). The limiting posi- 
tion is reached when the plane Ly, is tangent to the cone Ko on a generatrix. Then 
Wy again intersects in one pole (the one that contains the intersection with the axis 
xq). This intersection is a parabola; see Fig. 11.8(b). And if the plane Ly, intersects 
Ko in two different generatrices, then Wy, intersects both of its poles (on the side of 
the plane Ly on which is located the plane W,, parallel to it). This intersection is a 
hyperbola; see Fig. 11.8(c). 

The connection between planar quadrics and conic sections is revealed particu- 
larly clearly by the metric classification of such quadrics, which forms part of any 
sufficiently rigorous course in analytic geometry. Let us recall only the main results. 

As was done in Sect. 11.5, we must exclude from consideration those conics that 
are cylinders and those that are unions of vector subspaces (that is, in our case, lines 
or points). Then the results obtained in Sect. 11.5 give us (in coordinates x, y) the 


432 11 Quadrics 


aZVvGS 


(a) (b) (c) 


Fig. 11.8 Conic sections 


following three types of conic: 


nN 
i) 
i) 


y? 
en x? +a*y=0, (11.75) 


ote: 
i) 


where a > 0 and b > O. From the point of view of affine classification presented 
above, curves of the first type are ellipses, those of the second type are hyperbolas, 
and those of the third type are parabolas. 

Let us recall that in a course in analytic geometry, these curves are defined as 
geometric loci of points of the plane satisfying certain conditions. Namely, an ellipse 
is the geometric locus of points the sum of whose distances from two given points 
in the plane is constant. A hyperbola is defined analogously with sum replaced by 
difference. A parabola is the geometric locus of points equidistant from a given point 
and a given line that does not pass through the given point. 

There is an elegant and elementary proof of the fact that all ellipses, hyperbolas, 
and parabolas are not only affinely, but also metrically, that is, as geometric loci of 
points, equivalent to planar sections of a right circular cone. Let us recall that by 
right circular cone we mean a cone K in three-dimensional space obtained as the 
result of a rotation of a line about some other line, called the axis of the cone. The 
lines forming the cone are called its generatrices; they intersect the axis of the cone 
in one common point, called its vertex. 

In other words, this result means that the section of a right circular cone with a 
plane not passing through the vertex of the cone is either an ellipse, a hyperbola, or a 
parabola, and every ellipse, hyperbola, and parabola coincides with the intersection 
of a right circular cone with a suitable plane.° 


>The proof of this fact is due to the Franco-Belgian mathematician Germinal Pierre Dandelin. It 
can be found, for example, in A.P. Veselov and E.V. Troitsky, Lectures in Analytic Geometry (in 
Russian); B.N. Delone and D.A. Raikov, Analytic Geometry (in Russian); P. Dandelin, Mémoire 
sur l’hyperboloide de révolution, et sur les hexagones de Pascal et de M. Brianchon; D. Hilbert 
and S. Cohn-Vossen, Geometry and the Imagination. 


Chapter 12 
Hyperbolic Geometry 


The discovery of hyperbolic (or Lobachevskian) geometry had an enormous impact 
on the development of mathematics and on how the relationship between mathemat- 
ics and the real world was understood. The discussions that swirled around the new 
geometry also seem to have influenced the views of many in the humanities, who, in 
this regard, unfortunately were too much taken by a literary image: the contrast be- 
tween “down-to-earth” Euclidean geometry and the “otherworldly” non-Euclidean 
geometry invented by learned mathematicians. It seemed that the difference between 
the two geometries was that in the first geometry, as was clear to everyone, parallel 
lines did not intersect, while in the second, what to normal intelligence was difficult 
of comprehension, they do intersect. However, of course, this is exactly the opposite 
of the truth: in the non-Euclidean geometry of Lobachevsky, given a point external 
to a given line, it is possible for infinitely many lines to pass through the point with- 
out intersecting the line. It is this that distinguishes Lobachevsky’s geometry from 
that of Euclid. 

Ivan Karamazov, in Dostoevsky’s novel The Brothers Karamazov, likely sowed 
confusion among those in the humanities with the following literary image: 


At the same time there were and are even now geometers and philosophers, even some of the 
most outstanding among them, who doubt that the whole universe, or, even more broadly, 
the whole of being, was created purely in accordance with Euclidean geometry; they even 
dare to dream that two parallel lines, which according to Euclid cannot possibly meet on 
earth, may perhaps meet somewhere in infinity. 


Around the time this novel was being written, Friedrich Engels wrote Anti- 
Diihring, where an even more vivid image is used: 


But in higher mathematics, another contradiction is achieved, that lines that intersect before 
our eyes, nevertheless a mere five or six centimeters from their point of intersection are to 
be considered parallel, that is, lines that cannot intersect even when extended to infinity. 


In this, the author sees the manifestation of some sort of “dialectic.” 

And even up to the present, it is possible to encounter, in print, such literary 
images that oppose Euclidean and non-Euclidean geometries by saying that in the 
former, parallel lines do not intersect, while in the latter, they “intersect somewhere 


LR. Shafarevich, A.O. Remizov, Linear Algebra and Geometry, 433 
DOI 10.1007/978-3-642-30994-6_12, © Springer-Verlag Berlin Heidelberg 2013 


434 12 Hyperbolic Geometry 


or other.” Usually, by non-Euclidean geometry is meant the hyperbolic geometry of 
Lobachevsky, which is quite understandable by anyone who has passed a college 
course in some technical subject, and there are many such people today. To be sure, 
nowadays, this is presented in mathematics departments in more advanced courses 
in differential geometry. But hyperbolic geometry is so tightly linked to a first course 
in linear algebra, that it would be a pity not to say something about it here. 


12.1 Hyperbolic Space* 


In this chapter we shall be dealing exclusively with real vector spaces. 

We shall define hyperbolic space of dimension n, which we shall hereinafter 
denote by L, or simply L if we do not need to indicate the dimension, as a part of 
n-dimensional projective space P(L), where L is a real vector space of dimension 
n-+ 1. We shall denote the dimension of the space L by dim L. 

Let us equip L with a pseudo-Euclidean product (x, y); see Sect. 7.7. Let us 
recall that there, the quadratic form (x?) has index of inertia n, and in some basis 
€1,.--,@n+1 (called orthonormal) for the vector 


X= ANey tes + Ayn€n + An+1en+1, (12.1) 


it takes the form 
(x?) Sap t---+o0n—an,). (12.2) 


In the pseudo-Euclidean space L, let us consider the light cone V defined by the 
condition (x?) = 0. We say that a vector a lies inside the cone V if (a) <0 (recall 
that in Chap. 7, we called such vectors timelike). It is obvious that the same then 
holds as well for all vectors on the line (a), since ((wa)*) = a7(a”) < 0, and we 
shall consider this space over the field of real numbers. Such lines are also said to 
lie inside the light cone V. 

Points of the projective space P(L) corresponding to lines of the space L lying in- 
side the light cone V are called points of the space L. Consequently, they correspond 
to those lines (x) of the space L that in the form (12.1) satisfy the inequality 


at +-.-+o% <a). (12.3) 

In view of condition (12.3), the set LC P(L) is contained in one affine subset 
On+1 4 0 (see Sect. 9.1). Indeed, in the case a, +1 = 0, we would obtain in (12.3) the 
inequality at feet a? < 0, which is impossible in view of the fact that a1, ..., ay 
are real. As we did previously in Sect. 9.1, we can identify the affine subset a,+1 40 
with the affine subspace E : @,41 = | and hence view L as a part of E; see Fig. 12.1. 

The space of vectors of the affine space E is the vector subspace Eg C L defined 
by the condition a4; = 0. In other words, Eg = (e,..., €,). Let us note that the 
space of vectors Eg is not simply a vector space. As a subspace of the pseudo- 
Euclidean space L, it would seem that it should also be a pseudo-Euclidean space. 


12.1 Hyperbolic Space* 435 


Fig. 12.1 Model of 


hyperbolic space << 

But in fact, as can be seen from formula (12.2), the inner product (x, y) makes it 
a Euclidean space, in which the vectors e),..., @, form an orthonormal basis. This 
means that EF is an affine Euclidean space, and the basis e1,..., @n+1 of the space L 


forms within it a frame of reference with respect to which a point of the hyperbolic 
space LC E with coordinates (y1,..., yn) is characterized by the relationship 


Qj 


yiterty<lo y= 


= Ps eee Ze (12.4) 
An+1 


This set is called the interior of the unit sphere in E and will be denoted by U. 

Let us now turn our attention to identifying the subspaces of a hyperbolic space. 
They correspond to those vector spaces L’ C L that have a common point with 
the interior of the light cone V, that is, they contain a timelike vector a € L’. 
The inner product (x, y) defined in L is clearly also defined for all vectors in 
the subspace L’ CL. The space L’ contains the timelike vector a, and therefore, 
by Lemma 7.53, it is a pseudo-Euclidean space, and therefore, the associated hy- 
perbolic space L’ c P(L’) is defined. Since P(L’) Cc P(L) is a projective subspace, 
it follows that L’ c P(L). But hyperbolic space L’ is defined by the condition 
(x*) < 0 both in P(L) and in P(L’), and therefore, L’ Cc L. Here by definition, 
dim L’ = dim P(L’) = dimL’ — 1. The hyperbolic space L’ thus constructed is called 
a subspace in L. 

In particular, if L’ is a hyperplane in L, then dimL’ = dimL — 1, and then the 
subspace L’ Cc L is called a hyperplane in L. 

In the sequel we shall require the partition of L into two parts by the hyperplane 
Uch: 


L\L’=LtTuUL, LtnL” =e, (12.5) 


similar to how in Sect. 3.2, the partition of the vector space L into two half-spaces 
was accomplished with the help of the hyperplane L’ C L. 

The partition (12.5) of the space L cannot be accomplished by an analogous 
partition of the projective space P(L). Indeed, if we use the definition of the subsets 
L* and L~ from Sect. 3.2, then we see that for a vector x € Lt, the vector ax is in 
L~ if a <0, so that the condition x € L* does not hold for the line (x). But such a 
partition is possible for the affine Euclidean space E; it was constructed in Sect. 8.2 
(see p. 299). 

Let us recall that the partition of the affine space E by the hyperplane E’ Cc E 
was defined via the partition of the space of vectors Eo of the affine space E with 


436 12 Hyperbolic Geometry 


Fig. 12.2. Hyperbolic E 
half-spaces 


the aid of the hyperplane Ej C Eo corresponding to the affine hyperplane £’, that 


= 
is, consisting of vectors AB, where A and B are all possible points of E’. If we 
are given a partition Ep \ E) = =n UE), then we must choose an arbitrary point 


O € E’ and define E™ as the collection of all points A € E such that OA € EF (E- 
is defined analogously). The sets E* and E7~ thus obtained are called half-spaces, 
and they do not depend on the choice of point O € E’. Thus we have partitioned the 
set E \ E’ into two half-spaces: E \ E’= ET UE7. 

Let L’ be a hyperplane in the pseudo-Euclidean space L having nonempty inter- 
section with the interior of the light cone V, and let E’ be the associated hyperplane 
in the affine space E, that is, E’ = EM P(L’). Then E’ has nonempty intersection 
with the interior of the unit sphere U, given by relationship (12.4), and for the set 
LC E, we obtain the partition (12.5), where 


L’=LnE', Lt =ETOL, L>=E7-NL. (12.6) 


The sets Lt and L~ defined by relationships (12.6) are called half-spaces of the 
space L. 

To put it more simply, the hyperplane E’ divides the interior of the sphere U C E 
identified with the space L into two parts, U* and U7 (see Fig. 12.2), which corre- 
spond to the half-spaces L* and L~. 

Let us show that both half-spaces Lt and L~ are nonempty, although Fig. 12.2 
is sufficiently convincing by itself. We give the proof for Lt (for L~, the proof is 
similar). 

Let us consider an arbitrary point O € E’ ML. It corresponds to the vector a = 
aye; +--+ Onen + €n+1 With (a*) < 0 (see the definition of the affine space E on 
p. 434). Letc e ae and B € Et be points such that OB = c. Let us consider vectors 


b, =a-+te €Land points B,; € E for which OB, = b, for varying values of t € R. 
Let us note that if t > 0, then B, € Et, and if here (b7) <0, then B, € E+ N L= 
L*. As can be seen without difficulty, the scalar square (b?) is a quadratic trinomial 
int: 


(b7) = ((a + te)*) = (a*) + 2¢(a,c) + 1°(c*) = P(t). (12.7) 


By our selection, the vector c ~ 0 belongs to the Euclidean space Eo, and there- 
fore, (c?) > 0. On the other hand, by assumption, we have (a) <0. This yields that 
the discriminant of the quadratic trinomial P(t) on the right-hand side of relation- 
ship (12.7) is positive, and therefore, P(t) has two real roots, ft; and f2, and from the 


12.1 Hyperbolic Space* 437 


condition (a?) < 0 it follows that they have different signs, that is, t;f2 < 0. Then, 
as is easy to see, P(t) < 0 for every t between the roots f; and t2. We will choose a 
positive such number rf. 

Since the hyperbolic space L can be viewed as a part of the affine space E, 
then from E we can transfer onto L the notion of line segment, the notion of lying 
between for three points on a line segment, and the notion of convexity. An easy 
verification (analogous to what we did at the end of Sect. 8.2) shows that the subsets 
L* and L~ introduced earlier of the set L \ L’ are characterized by the property of 
convexity: if two points A, B are in L*, then all points lying on the segment [A, B] 
are also in Lt (the same clearly holds for the subset L~). 

Let us consider linear transformations A of a vector space L that are Lorentz 
transformations with respect to a symmetric bilinear form g(x, y) corresponding 
to the quadratic form (x7) and the associated projective transformations P(A). The 
latter transformations obviously take the set L to itself: given that a transforma- 
tion A is a Lorentz transformation and from the condition (x?) < 0, it follows that 
(A(x)*) = (x?) < 0. The transformations of the set L that arise in this way are 
called motions of the hyperbolic space L. 

Thus motions of the space L are projective transformations of the projective 
space P(L) containing L and taking the quadratic form (x7) into itself. By what 
we have said thus far, the definition of the interior of the light cone V can be written 
in homogeneous coordinates in the form 


xpte+x7—4x2,) <0, (12.8) 
and in inhomogeneous coordinates y; = x;/x,+1 in the form 
97 ge 2 1) (12.9) 


We consider motions of a hyperbolic space as transformations of the set L, that is, 
as transformations taking the interior of the unit sphere given by condition (12.9) 
into itself. 

Let us write down some simple properties of motions: 


Property 12.1 The sequential application (composition) of two motions f; and /f2 
(as transformations of the set LL) is again a motion. 


This follows at once from the fact that the composition of nonsingular transfor- 
mations “A, and Az is a nonsingular transformation, and this holds as well for the 
corresponding projective transformations P(.4;) and P(.A2). Moreover, if A, and 
“Az are Lorentz transformations with respect to the bilinear form g(x, y), then the 
result of their composition has the same property. 


Property 12.2 A motion is a bijection of L to itself. 


This assertion follows from the fact that the corresponding transformations A : 
L— Land P(A) : P(L) > P(L) are bijections. But by the definition of a hyperbolic 


438 12 Hyperbolic Geometry 


space, it is also necessary to verify that every line contained in the interior of the light 
cone V is the image of a similar such line. If we have the line (a) with a timelike 
vector a, then we know already that there exists a vector b such that A(b) =a. 
Since A is a Lorentz transformation of a pseudo-Euclidean space L, we have the 
relationship (b*) = (A(b)”) = (a2) < 0, from which it follows that the vector b is 
also timelike. Thus the transformation A takes the line (b) lying inside V into the 
line (a), also inside V. 


Property 12.3 Like every bijection, a motion f has an inverse transformation f~!. 
It is also a motion. 


The verification of this property is trivial. 

At first glance, it is not obvious that there are “sufficiently many” motions of a 
hyperbolic space. We shall establish this a bit later, but for now, we shall point out 
some important types of motions. 

A transformation g is of type (a) if g = P(A), where -A is a Lorentz transforma- 
tion of the space L such that A(@n+1) = en41.- 

Since the basis €1,..., @n+1 of the pseudo-Euclidean space L is orthonormal, we 
have the decomposition 


a 


L=(ens1) @ (engi), (Eng 1)> = (€1,-- 25 €n)s (12.10) 


and all transformations A:L-— L with the indicated property take the subspace 
Eo = (e€1,..., @n) into itself. 

Conversely, if we define A:L— L as an orthogonal transformation of the Eu- 
clidean subspace Eg and set A(@n+1) = @n+1, then P(A) will of course be a mo- 
tion of the hyperbolic space. In other words, these transformations can be described 
as orthogonal transformations of inhomogeneous coordinates. All thus constructed 
motions of the space L have the fixed point O corresponding to the line (en41) 
in L, or in other words, the point O = (0,..., 0) in the inhomogeneous system of 
coordinates (y1,..., Yn). 

From the point of view of hyperbolic space, the constructed motions precisely co- 
incide with those motions that leave the point O € L fixed. Indeed, as we have seen, 
the point O corresponds to the line (e,,41), and the motion g is equal to P(.A), where 
A is a Lorentz transformation of the space L. The condition g(O) = O means that 
A((€n+1)) = (€n41), that is, A(@n+1) = Aen+1. From the fact that A is a Lorentz 
transformation, it follows that A = +1. By multiplying A by +1, which obviously 
does not change the transformation g = P(A), we can obtain that the conditions 
A(€n+1) = €n+1 are satisfied, whence by definition, it follows that g is a transfor- 
mation of type (a). 

Type (b) is connected with a certain line L, C L of a hyperbolic space. By defini- 
tion, the line L; is determined by the plane L’ CL, dimL’ = 2. Since by assumption, 
the plane L’ must contain at least one timelike vector x, it follows by Lemma 7.53 
(p. 271) that it is a pseudo-Euclidean space. From formula (6.28) and Theorem 6.17 


12.1 Hyperbolic Space* 439 


(law of inertia), it follows that all such spaces of a given dimension are isomor- 
phic. Therefore, we can choose a basis in L’ with any convenient Gram matrix, pro- 
vided only that it defines a pseudo-Euclidean plane. We have seen (in Example 7.49, 
p. 269) that it is convenient to choose as such a basis the lightlike vectors f,, fo, 
for which 


1 
(f=()=0 Fi frd=5, 


and this means that for every vector x = x f; + y fo, its scalar square (x~) is equal 
to xy. In Example 7.61 (p. 277), we found explicit formulas for the Lorentz trans- 
formations of a pseudo-Euclidean plane in such a basis: 


Uf)=af;, Ulfr2=a'fy (12.11) 
or 


Uf)=af,, Ulf2=a'fy, (12.12) 


where a is an arbitrary nonzero number. In the sequel we shall need only transfor- 
mations given by formula (12.11). 

Since L’ is a nondegenerate space, it follows that by Theorem 6.9, we have the 
decomposition L = L’ @ (L’ )+. Let us now define a linear transformation A of the 
space L by the condition 


A(x+y)=U(x)+y, wherex eU’,ye(L’), (12.13) 


where U is one of the Lorentz transformations of the pseudo-Euclidean plane L’ 
defined by formulas (12.11) and (12.12). It is clear that then A is a Lorentz trans- 
formation of the space L. 

A motion of type (b) of the space L is a transformation P(A) obtained in the 
case that in formula (12.13), we take as U the transformation given by relation- 
ships (12.11). All motions thus constructed have a fixed line L, corresponding to 
the plane L’. 

It is quite obvious that motions of types (a) and (b) do not exhaust all motions of 
the hyperbolic plane, even if in the definition of motions of type (b), as U in formula 
(12.13) we were to use transformations U given not only by relationships (12.11), 
but also by (12.12). For example, they certainly do not include motions associated 
with Lorentz transformations that have a three-dimensional cyclic subspace (see 
Corollary 7.66 and Example 7.67). However, for our further purposes, it will suffice 
to use only motions of these two types. 


Example 12.4 In the sequel we are going to require explicit formulas for transfor- 
mations of type (b) in the case of the hyperbolic plane (that is, for n = 2). In this 
case, L is a three-dimensional pseudo-Euclidean space, and in the orthonormal basis 
€1, €2, €3, Such that 


(A)=1, (B=. ()==1, 


440 12. Hyperbolic Geometry 


the scalar square of the vector x = x,e; + x2e2 + x3e3 is equal to (x?) = aa + 
x - Bre The points of the hyperbolic plane L are contained in the affine plane 
x3 = 1, have inhomogeneous coordinates x = x; /x3 and y = x2/x3, and satisfy the 
relationship x? + y* <1. 

For writing the transformation A, let us consider the pseudo-Euclidean plane 
L’ = (e;, e3) and let us choose in it a basis consisting of lightlike vectors f,, f> 


associated with vectors €1, e3 by the relationships 


_ e+ e3 


e1 — 63 
Ti 7 fo= 7 


(12.14) 


from which we also obtain the inverse formulas e; = f; + f> ande3= f, — fo. 

Let us note that the orthogonal complement (L’)+ equals (e2), and by Theo- 
rem 6.9, we have the decomposition L = L’ @ (e2). Then in accord with formula 
(12.13), for the vector z= x + y, where x € L’ and y € (e2), we obtain the value 
A(z) = U(x) + y, where U: L’ > LU’ is the Lorentz transformation defined in the 
basis f,, f> by formula (12.11). From this, taking into account expression (12.14), 
we obtain 


UC ya etan Pe soe UC ae ota 
eyv= 5 e| 5 e3, 63) = 5 e| 5 e3. 
Let us set 
-1 | 
pa aes (12.15) 
2, 2 


Then a + b =a and a* — b? = 1. It is obvious that any numbers a and b satisfying 
these relationships can be defined in terms of the number a = a + b by formulas 
(12.15). Therefore, we obtain the linear transformation A :L— L, for which 


A(e,) = ae; + bes, A(e2) = e2, A(e3) = be; + ae3. 


It is easy to see that for such a transformation, the vector x = x, e; + x2€2 + x3e3 is 
carried to the vector 


A(x) = (axy + bx3)e] + x2€2 + (bx; +.4Xx3)e3. 


In inhomogeneous coordinates, x = x, /x3 and y = x2/x3. This means that a point 
with coordinates (x, y) is carried to the point with coordinates (x’, y’), where 


, ax+b j 
» 3 => — , 
bx +a y bx +a 


a —b’=1. (12.16) 


This particular type of motion yields, however, an important general property: 


Theorem 12.5 For every pair of points of a hyperbolic space there exists a motion 
taking one point into the other. 


12.1 Hyperbolic Space* 441 


Proof Let the first point correspond to the line (a), and the second to the line (b), 
where a, b € L. If the vectors a and b are proportional, that is, (a) = (b), then our 
requirements will be satisfied by the identity transformation of the space L (which 
can be obtained in the form P(&), where & is the identity transformation of the 
space L). 

But if (a) 4 (b), that is, dim(a, b) = 2, then let us set L’ = (a, b). Let us consider 
the Lorentz transformation U: L’' > L’ of type (b) given by formula (12.11), the 
corresponding Lorentz transformation A:L-— L defined by formula (12.13), and 
the projective transformation P(.A) : P(L) > P(L). 

Let us show that the constructed projective transformation P(.A) takes a point 
corresponding to the line (a) to a point corresponding to the line (b), that is, the 
linear transformation “A: L — L takes the line (a) to the line (b). Since vectors a 
and b are contained in the plane L’, then by definition, it suffices for us to prove 
that for an appropriate choice of number @, the transformation U : L’ > L’ given by 
formula (12.11) takes the line (a) to the line (b). 

This is easily verified by a simple calculation using the basis f,, f>, given by 
formula (12.14), in the pseudo-Euclidean plane L’. Let us consider the timelike 
vectors a =a, f, +a2.f, and b=b,f, + bof. Since in the chosen basis, the 
scalar square of a vector is equal to the product of its coordinates, it follows that 
(a?) = ajay < 0 and (b*) = bbz < 0. From this, it follows in particular that all 
numbers a1, a2, by, bz are nonzero. 

We obtain from formula (12.11) that U(a) = wa; fy + aay Jz, and the condi- 
tion (U(a)) = (b) means that U(a) = wb for some pp 4 0. This yields the relation- 
ships aa, = jb, and ala = [Lb2, that is, 


aa, a2a,b2 9  a2b, — ajanbib2 

or az =apb2 = , = = 

by by aibz (ay b2) 

It is obvious that the latter relationship can be solved for a real number a if 
a\a2b,b2 > 0, and this inequality is satisfied, since by assumption, aja2 < 0 and 
by bz <0. 


Let us note that we have thus far not used motions of type (a). We shall need 
them to strengthen the theorem we have just proved. To do so, we shall make use of 
the notion of a flag, analogous to that introduced in Sect. 3.2 for real vector spaces. 


Definition 12.6 A flag in a space L is a sequence of subspaces 
Lo CL, cC---CL,=L (12.17) 
such that: 
(a) dimL,; =i for alli=0,1,...,n; 
(b) each pair of subspaces (Lj+,, L;) is directed. 


A subspace L; is a hyperplane in L;,1, and as we have seen (see formula (12.5)), 


it defines a partition Lj+; into two half-spaces: Lj; \ Lj = Li ULj;,,- And as 


442 12 Hyperbolic Geometry 


earlier, the pair (L;+1,L;) is said to be directed if the order of the half-spaces is 
indicated, for example by denoting them by esa and LL; ,,. Let us note that in a 
flag defined by the sequence (12.17), the subspace Lo has dimension 0, that is, it 
consists of a single point. We shall call this point the center of the flag (12.17). 


Theorem 12.7 For any two flags of a hyperbolic space, there exists a motion taking 
the first flag to the second. Such a motion is unique. 


Proof In the space L, let us consider two flags ® and ®’ with centers at the points 
P €Land P’ €L, respectively. Let O € L be the point corresponding to the line 
(€n+1) in L, that is, the point with coordinates yj = 0, ..., y, = 0 in relationship 
(12.4). By Theorem 12.5, there exist motions f and f’ taking P to O and P’ to O. 
Then the flags f(®) and f’(®’) have their centers at the point O. Each flag is by 
definition a sequence of subspaces (12.17) in L to which correspond the subspaces 
of the vector space L. Thus to the flags f(®) and f’(®’) there correspond two 
sequences of vector subspaces, 


(nape Ch CCl =L and. @aipely ch ce-ch =L 


where dimL; = dimLi =i + 1 for alli =0,1,...,n. 

Let us recall that the space L is identified with a part of the affine Euclidean space 
E, namely with the interior of the unit sphere U C E given by relationship (12.4). To 
investigate L as a part of E (see Fig. 12.1), it will be convenient for us to associate 
with each subspace M C L containing the vector e,+1, the affine subspace N C E 
of dimension one less containing the point O. To this end, let us first associate 
with each subspace M C L containing the vector e,;+1, the vector subspace N C M 
determined by the decomposition M = (e,41) ® N. Employing notation introduced 
earlier, we obtain that 


N= ((@n41)~ AM) = ((e1,--.,€n) AM) C (e1,...,&n) = Eo, 


that is, N is contained in the space of vectors of the affine space E. Consequently, 
the vector subspace N C Eo determines a set of parallel affine subspaces in E that 
are characterized by their spaces of vectors coinciding with N. Such affine subspaces 
can be mapped to each other by a translation (see p. 296), and to determine one of 
them uniquely, it suffices simply to designate a point contained in this subspace. 
As such a point, we shall choose O. Then the vector subspace N C Eo uniquely 
determines the affine subspace N C E, where clearly, dim N = dimN= dimM — 1. 

Thus we have established a bijection between k-dimensional vector subspaces 
M c L containing the vector e,+1 and (k — 1)-dimensional affine subspaces N C E 
containing the point O. Here clearly, the notions of directedness for the pair M’ C M 
and N’ C N coincide. In particular, flags f(®) and f’(®’) of the space L with 
center O correspond to two particular flags of the affine Euclidean space E with 
center at the point O. 


12.2 The Axioms of Plane Geometry* 443 


By Theorem 8.40 (p. 316), in an affine Euclidean space, there exists for every 
pair of flags, a motion that takes the first flag to the second. Since in our case, both 
flags have a common center O, it follows that this motion has the fixed point O, 
and by Theorem 8.39, it is an orthogonal transformation of the Euclidean space 
Eo. Let us consider g = P(A), the motion of type (a) of the space L corresponding 
to this orthogonal transformation A. Clearly, it takes the flag f(®) to f’(®’), that 
is, gf (®) = f’(®’). From this, we obtain that f’~'gf(®) = ®’, as asserted in the 
theorem. 

It remains to prove the assertion about uniqueness in the statement of the theo- 
rem. Let f; and f2 be two motions taking some flag ® with center at the point P 
to the same flag, that is, such that f;(®) = fo(®). Then f = fi f is a motion, 
and f(®) = @. If we prove that f is the identity transformation, then the required 
equality f; = fo will follow. 

By Theorem 12.5, there exists a motion g taking the point P to O. Let us set B! = 
g(®). Then @’ is a flag with center at the point O. From the equalities f(@) = ® 
and g(®) = @’ it follows that gfg—!(®’) = ®’. Let us denote the motion gfg7! 
by h. It clearly takes the flag ®’ to itself, and in particular, has the property that 
h(O) = O. From what we said on p. 438, it follows that h is a motion of type (a), 
that is, h = P(A), where A is a Lorentz transformation of the space L that in turn, 
is determined by a certain orthogonal transformation U of the Euclidean space Eo. 

Let &” be the flag in the Euclidean space Eg corresponding to the flag &’ of the 
space LL. Then from the condition h(®’) = @’, it follows that U(}”) = &”, that 
is, the transformation U takes the flag ©” to itself. Consequently (see p. 225), the 
transformation U is the identity, which yields that the motion / that it defines is the 
identity. From the relationship h = gfg~!, it then follows that gf = g, that is, f is 
the identity transformation. 


Thus motions of a hyperbolic space possess the same property as that established 
in Sect. 8.4 (p. 317) for motions of affine Euclidean spaces. It is this that explains 
the special place of hyperbolic spaces in geometry. The Norwegian mathematician 
Sophus Lie called this property “free mobility.” There exists a theorem (which we 
shall not only not prove, but not even formulate precisely) showing that other than 
the space of Euclid and the hyperbolic space of Lobachevsky, there is only one 
space that exhibits this property, called a Riemann space (we shall have a bit to say 
about this in Sect. 12.3). This assertion is called the Helmholtz—Lie theorem. For its 
formulation, it would be necessary first of all to define just what we mean here by 
“space,” but we are not going to delve into this. 

The property that we have deduced (Theorem 12.7) suffices for discussing the 
axiomatic foundations of hyperbolic geometry. 


12.2 The Axioms of Plane Geometry* 


Hyperbolic geometry arose historically as a result of the analysis of the axiomatic 
systems of Euclidean geometry. The viewpoint toward geometry as based on a small 


444 12 Hyperbolic Geometry 


number of postulates from which all the remaining results are derived by way of 
formal proof arose in ancient Greece approximately in the sixth century B.C.E. Tra- 
dition connects this viewpoint with the name Pythagoras. An account of geometry 
with this point of view is contained in Euclid’s Elements (third century B.C.E.). This 
point of view was accepted during the development of science in the modern era, 
and for a long time, geometry was taught directly from Euclid’s books, and then 
later, there appeared simplified accounts. Moreover, this same point of view came 
to permeate all of mathematics and physics. In this spirit were written, for example, 
Newton’s The Mathematical Principles of Natural Philosophy, known as the Prin- 
cipia. In physics and generally in the natural sciences, “laws of nature” played the 
role of axioms. 

In mathematics, this direction of thought led to a more thorough working out of 
the axiom system of Euclidean geometry. Euclid divides the assertions on which his 
exposition is based into three types. One he calls “definitions”; another, “axioms”; 
and the third, “postulates” (the principle separating the last two of these is unclear 
to modern researchers). Many of his “definitions” also seem questionable. For ex- 
ample, the following: “A line is a length without width” (definitions of “length” 
and “width” are not given). Some “axioms” and “postulates” (we shall call all of 
these axioms) are simple corollaries of others, so that they could as well have been 
discarded. But what attracted the most attention was the “fifth postulate,’ which in 
Euclid is formulated thus: 


That if a straight line falling on two straight lines makes the interior angles on the same side 
less than two right angles, the two straight lines, if produced indefinitely, meet on that side 
on which are the angles less than the two right angles. 


This axiom differs from the others in that its formulation is notably more com- 
plex. Therefore, the following question arose (probably already in antiquity): can 
this assertion be proved as a theorem derived from the other axioms? An enormous 
number of “proofs of the fifth postulate” appeared, in which, however, there was 
always found a logical error. These investigations nevertheless helped in clarifying 
the situation. For example, it was proved that in the context of the other axioms, 
the fifth postulate is equivalent to the following assertion about parallel lines that is 
now usually presented as this postulate: through every point A not lying on a line 
a, it is possible to construct exactly one line b parallel to a (lines a and b are said 
to be parallel if they do not intersect). Here the existence of a line b parallel to a 
and passing through the point A can easily be proved. The entire content of the fifth 
postulate is reduced to the assertion about its uniqueness. 

Finally, at the beginning of the nineteenth century, a number of researchers, one 
of whom was Nikolai Ivanovich Lobachevsky (1792-1856), came up with the idea 
that a proof of the fifth postulate is impossible, and so its negation leads to a new 
geometry, logically no less perfect than the geometry of Euclid, even though it con- 
tains in some respects some unusual propositions and relationships. 

The question could be posed more precisely as a result of the development of the 
axiomatic method. This was done by Moritz Pasch (1843-1930), Giuseppe Peano 
(1858-1932), and David Hilbert (1862-1943) at the end of the nineteenth century. 
In his work on the foundations of geometry, Hilbert formulated in particular the 


12.2 The Axioms of Plane Geometry* 445 


principles on which an axiomatic system is constructed. Today, such an approach 
has become commonplace; we used it to define vectors and Euclidean spaces. The 
general principle consists in fixing a certain set of objects, which remain undefined 
(for example, in the case of the definition of a vector space, these were scalars and 
vectors), and also in fixing certain relations that are to exist among these objects, 
which are likewise undefined (in the case of the definition of a vector space, these 
were addition of vectors and multiplication of a vector by a scalar). Finally, axioms 
are introduced that establish the specific properties of the introduced concepts (in the 
case of the definition of a vector space, these were enumerated in Sect. 3.1). With 
such a formulation, there remains only the question of consistency of the theory, 
that is, whether it is possible from the given axioms to derive simultaneously some 
statement as well as its negation. In the sequel, we shall introduce an axiom system 
for hyperbolic geometry (restriction to the case of dimension 2) and discuss the 
question of its consistency. 

Let us begin with a discussion of axioms. The lists of axioms that Hilbert and 
his predecessors introduced in their early work turned out to possess certain logi- 
cal defects. For example, in deduction, it turned out to be necessary to use certain 
assertions that were not contained among the axioms. Hilbert then supplemented 
his system of axioms. Later, this system of axioms was simplified for the sake of 
clarity. We shall use the axiom system proposed by the German geometer Friedrich 
Schur (1856—1932).! Here we shall restrict our attention (exclusively for the sake of 
brevity) to the axiomatics of the plane. 

A plane is a certain set IT, whose elements A, B, and so on, are called points. 
Certain bijective mappings f : [7 — IT are called motions. These are the fundamen- 
tal objects. The relationships among them are expressed as follows: 


(A) Certain distinguished subsets /, 1’, and so on, of the set [7 are called lines. That 
an element A € IT belongs to the subset / is expressed by saying that “the point 
A lies on the line /” or “the line / passes through the point A.” 

(B) For three given points A, B, C lying on a given line /, it is specified when the 
point C is considered to lie between the points A and B. This must be specified 
for every line / and for every three points lying on it. 


These objects and relations satisfy the conditions called axioms, which it is con- 
venient to collect into several groups: 


I. Axioms of relationship 

1. For every two points, there exists a line passing through them. 

2. If these points are distinct, then such a line is unique. 

3. On every line there lie at least two points. 

4. For every line, there exists a point not lying on it. 

II. Axioms of order 

1. If on some line /, the point C lies between points A and B, then it is distinct 

from them and also lies between points B and A. 


Here we shall follow the ideas of Boris Nikolaevich Delaunay, or Delone (1890-1980), in his 
pamphlet Elementary Proof of the Consistency of Hyperbolic Geometry, 1956. 


446 12. Hyperbolic Geometry 


Fig. 12.3 Intersection of the 
sides of a triangle by a line 


2. If A and C are two distinct points on some line, then on this line there is at 
least one point B such that C lies between points A and B. 

3. Among three points A, B, and C lying on a given line, not more than one of 
the points lies between the two others. 


Before formulating the last axiom of this group, let us give some new definitions. 
The set of all points C on a given line / passing through the points A and B that 
lie between them (including the points A and B themselves) is called a segment 
with endpoints A and B, and is denoted by [A, B]. Axiom 2 of group II can be 
reformulated thus: [A, C] £/ \ (A UC), with the inequality here being understood 
as an inequality of sets. That a segment [A, B] contains points other than A and B 
is proved on the basis of the axioms of group I and the last axiom of group II, to 
the formulation of which we now turn. Three points A, B,C not all lying on any 
one line are called a triangle, and this relationship is denoted by [A, B, C]. The 
segments [A, B], [B, C], and [C, A] are called the sides of the triangle [A, B, C]. 


4. Pasch’s axiom. If points A, B,C do not all lie on the same line, none of them 
belong to the line /, and the line / intersects one side of the triangle [A, B, C], 
then it also intersects another side of the triangle. 


In other words, if a line J has a point D in common with the line /’ passing 
through points A and B, with D lying between A and B on /’, then the line / either 
has a common point E with the line /; passing through B and C, with E lying 
between them on /;, or has a common point F with the line /2 passing through A 
and C, with F lying between them on /2. The two cases discussed in this last axiom 
are depicted in Fig. 12.3. 


Ill. Axioms of motion 

1. For every motion f, the inverse mapping f~! (which exists by the definition 
of a motion as a bijective mapping of the set /7) is also a motion. 

2. The composition of two motions is a motion. 

3. A motion preserves the order of points. That is, a motion f takes a line / to 
a line f(/), and if the point C on the line / lies between points A and B on 
this line, then the point f(C) of the line f(/) lies between points f(A) and 
f(B). 


12.2 The Axioms of Plane Geometry* 447 


The formulation of the fourth axiom of motion requires certain results that can be 
obtained as corollaries of the axioms of relationship and order. We shall not prove 
these here, but let us give only the formulations.” 

Let us begin with properties of lines. Let us choose a point O on a line /. Points 
A and B on this same line, both of them different from O, are said to line on one 
side of O if O does not lie between A and B. If we select some point A different 
from O, then points B different from O and lying together with A on one side of O 
form a subset of the set of points of the line / called a half-line and denoted by IT. 
It can be proved that if we choose in this subset another point A’, then the half-line 
formed with it will be the same as before. Here what is important is only the choice 
of the point O. If we choose a point A; such that O lies between A and A}, then 
the point A; determines another half-line, denoted by /~. The half-lines /* and /~ 
determined by the points A and A, do not intersect, and their union is / \ O, that is, 
ItTAI =@and!* Ul- =1\ O. 

One can verify analogous properties for a line / in the plane /7. Let us consider 
two points A and B that do not belong to the line /. One says that they lie on one 
side of | if either the line /’ passing through them does not intersect the line /, or the 
lines / and /' intersect in a point C that does not lie between points A and B of the 
line /’. The set of points not lying on the line / and lying on the same side of / as the 
point A is called a half-plane. Again, it is possible to prove that with the choice of 
another point A’ instead of A in this half-plane, we define the same set. There exist 
two points A and A’ that do not belong to the same half-plane. However we select 
these points (given a fixed line /), we will always obtain two subsets J7* and [7— 
of the plane /7 such that 7* 0 7~ = @ and 7+ UlI~ = JT \ 1. 

Suppose we are given a point O and a line / passing through it. If in the partition 
of / \ O into two half-lines, one of them is distinguished, and in the partition JT \ / 
into two half-planes, one of them is distinguished (for example, let us denote them 
by /* and JT, respectively), then the pair (O,/) is called a flag and is denoted 
by @. As follows from what was discussed in Sect. 12.1, this is a special case (for 
n = 2) of the notion of a flag introduced earlier. 

Every motion takes a flag to a flag, that is, if f is a motion and @ is the flag 
(O,1), then the sets f(J)* and f(J)~, whose union is f(/) \ f(O), coincide with 
f(i*) and f(/~), where /* and I~ are the half-lines on the line / determined by 
the point O. Here their order can change. Analogously, a pair of half-planes _f (I7)* 
and f(IT)~ defined by the line f(J) coincide with the pair f(7*) and f(7—), 
where J7* and /7~ are the half-planes determined by the line /. Their order also 
can change. 

We can now formulate the last (fourth) axiom of motion: 


4. Axiom of free mobility. For any two flags ® and @’, there exists a motion f 
taking the first flag to the second, that is, f(®) = &’. Such a motion is unique, 
and it is uniquely determined by the flags ® and @’. 


Some of these are proved in first courses in geometry, and in any case, elementary proofs of all of 
these results can be found in Chap. 2 of the book Higher Geometry, by N.V. Efimov (Mir, 1953). 


448 12 Hyperbolic Geometry 


IV. Axiom of continuity 
1. Let a set of points of some line / be represented arbitrarily as the union of 
two sets M, and Mo, where no point of the set M lies between two points 
of the set M2, and conversely. Then there exists a point O on the line / such 
that M, and M) coincide with the half-lines of / determined by the point O, 
to either of which the point O can be joined. 


This axiom is also called Dedekind’s axiom. 

Axioms J-IV that we have presented are called axioms of “absolute geometry.” 
They hold for both Euclidean and hyperbolic geometry. These two geometries are 
distinguished by the addition of one axiom that deals with parallel lines. Let us 
recall that parallel lines are lines having no points in common. Thus in both cases, 
one more axiom is added: 


V. Axiom of parallel lines 


1. In Euclidean geometry: For every line / and every point A not lying on it, 
there exists at most one line /’ passing through the point A and parallel to /. 

1’. In hyperbolic geometry: For every line / and every point A not lying on it, 
there exist at least two distinct lines /’ and 1” parallel to /. 


The justified interest in precisely these two axioms is due to the fact that already 
in absolute geometry (that is, with only the axioms from groups I-IV), it is possible 
to prove that for every line / and every point A not on J, there exists at least one line 
l' passing through A and parallel to /. 

It is now possible to formulate more precisely the goal that mathematics set for 
itself in the attempt to “prove the fifth postulate,” that is, to derive assertion | in 
group V of axioms from axioms in groups I-IV. But Lobachevsky (and other re- 
searchers of the same epoch) came to the conclusion that this was impossible, and 
this meant that the system comprising groups I-IV and axiom 1’ was consistent. 

Strictly speaking, we could have posed such questions even earlier, in connection 
with any of the theories that we encountered based on some system of axioms, 
such as the theory of vector spaces or that of Euclidean spaces. The question of the 
consistency of the concepts of vector spaces or Euclidean spaces is easily answered: 
it suffices to show (in the case of real spaces) examples of vector spaces over R” of 
any finite dimension or Euclidean spaces with inner product (x, y) = x,y) +--+ + 
Xnyn. Of course, this assumes the construction and proof of the consistency of the 
theory of the real numbers, but that lies outside the scope of our investigation, and 
we shall not consider it here. However, assuming as given that the properties of real 
numbers are defined and do not raise any doubts, we may, for example, say that if 
the system of axioms of a real vector space given in Sect. 3.1 were inconsistent, then 
we would be able to derive two mutually contradictory assertions about the space 
IR”. However, any assertion about the space R” can be reduced by definition to an 
assertion about the real numbers, and then we would obtain a contradiction in the 
domain of real numbers. 

The same question could be posed in relationship to Euclidean geometry, that 
is, with respect to the system of axioms consisting of axioms of groups I-IV and 


12.2 The Axioms of Plane Geometry* 449 


axiom | of group V. Here the answer is in fact already known, since we have con- 
structed the theory of affine Euclidean spaces (even in arbitrary dimension 7). It is 
easily ascertained that for n = 2, all the axioms of Euclidean geometry that we in- 
troduced are satisfied. Some refinements are perhaps necessary only in connection 
with the axioms of order. 

These axioms do not require an inner product on the space and are formulated 
for an arbitrary real affine space V in Sect. 8.2. All the assertions constituting the 
axioms of order now follow directly from the properties of order of the real num- 
bers, except only Pasch’s axiom. Its idea is that if a line “enters” a triangle, then it 
must “exit” from it. Intuitively, this is quite convincing, but with our approach, we 
must derive this assertion from the properties of affine spaces. It is a very simple 
argument, whose details we leave to the reader. 

Specifically, by what is given, points A and B (we shall use the same notation 
as in the formulation of the axioms) lie in different half-planes into which the line 
1 divides the plane [7. Everything depends on the half-plane to which the point C 
belongs: to the same one as A, or to the same one as B. In the first case, the line / 
has a common point with the line /2, which lies on it between B and C, while in the 
second case, the common point is with the line /;, which lies between A and C; see 
Fig. 12.3. In each of these two cases, the assertion of Pasch’s axiom is easy to verify 
if we recall the definitions. 

We in fact checked in one form or another that the remaining axioms are satisfied 
even as assertions that relate to arbitrary dimension. 

We shall now turn to the axioms of hyperbolic geometry, that is, the axioms of 
groups I-IV and axiom 1’ of group V. We shall prove that they are consistent, based 
on the consistency of the usual properties (which likewise are easily reduced to 
certain axioms) of the set of real numbers R and based on the theory of Euclidean 
spaces of dimension 2 and 3 constructed on this basis. On this foundation, we shall 
prove the following result. 


Theorem 12.8 The system of axioms of hyperbolic geometry is consistent. 


Proof We shall consider in the Euclidean plane L the open disk K (given, for exam- 
ple, in some coordinate system by the condition x? + y* < 1). We shall call the set 
of its points a “plane” (denoted by /7), and we shall call “points” only the points of 
this disk. The intersection of every line / of the plane L with the disk K that has at 
least one point in common with this disk is the interior of some segment (this was 
proved in the previous section). We shall call such nonempty intersections 1M K 
“lines,” denoted by /, I’, and so on. Finally, we shall call a projective transformation 
of the plane L taking the disk K into itself a “motion.” 

Since the definition of projective transformation assumes a study of the projec- 
tive plane, and a projective space of dimension n and its projective transformations 
were defined in Chap. 9 in terms of a vector space of dimension n + 1, it follows 
that for the analysis of the hyperbolic plane, we must use here a notion connected 
with a three-dimensional vector space. However, it would not be difficult to give a 
formulation appealing only to properties of the Euclidean plane. 


450 12. Hyperbolic Geometry 


Fig. 12.4 “Lines” and 
“points” of the hyperbolic 
plane 


Now let us define the fundamental relationships between “lines” and “points.” 
That a “line” 7 passes through a “point” A € /7 will be understood to mean the 
condition that the line / passes through the point A. Thus an arbitrary “line” / is the 
set of “points” that lie on it. Let “points” A, B, C lie on the “line” 7. We shall say 
that a “point” C lies between “points” A and B if such is the case for A, B, and C as 
points on the Euclidean line / that contains / (this makes sense, since / is contained 
in Euclidean space). 

It remains to verify that the notions and relationships presented satisfy the axioms 
of hyperbolic geometry, that is, the axioms of groups I-IV and axiom 1’ of group V. 
The verification of this for the axioms of groups I, II, and IV is trivial, since the 
corresponding objects and relationships are defined exactly as in the surrounding 
Euclidean plane. For the axioms of group III (axioms of motion), the required prop- 
erties were proved in the previous section (indeed, for the case of a space of arbitrary 
dimension 7). It remains only to consider axiom 1’ of group V. 

Let / be the “line” associated with the line / in the Euclidean plane L. Then the 
line / intersects the boundary S of the disk K in two different points: P’ and P”. 
Let A be a “point” of the “plane” 7 (that is, a point of the disk K) not lying on 
the line /. By the axioms of Euclidean geometry, through the points A and P’ in 
the plane L, there passes some line /’. It determines the “line” I =I'NK of the 
“plane” J7. Similarly, the point P” determines the “line” I" =I" K; see Fig. 12.4. 

The lines /' and 1” are distinct, since they pass through different points P’ and 
P” of the plane L. Therefore, by the axioms of Euclidean geometry, they have no 
common points other than A. But the “lines” I’ and a as nonempty segments of 
Euclidean lines excluding the endpoints, contain infinitely many points and in par- 
ticular, the “points” B’ eT’ and B” eT’, with B’ 4 B”. This means that the “lines” 
T and7’ are distinct. On the other hand, in the sense of our definitions, both of them 
are parallel to the “line” 7, that is, they have no common “points” with it (points 
of the disk K). For example, the line /’ has with / the common point P’ in the Eu- 
clidean plane L, which means that by the axioms of Euclidean geometry, they have 
no other common points, and in particular, no common points in the disk K. 

We see that assertion 1’ holds for every “line” / C IT and every “point” A ¢ J. 
Let us now assume that from the axioms of hyperbolic geometry there could be 
derived an inconsistency (that is, some assertion and its negation). Then we could 
apply the same reasoning to the notions that earlier, with the proof of Theorem 12.8, 
we wrote in quotation marks: “point,” “plane,” “line,” and “motion.” Since they, 
as we have seen, satisfy all the axioms of hyperbolic geometry, we would again 


12.2 The Axioms of Plane Geometry* 451 


atrive at a contradiction. But the notions “plane,” “line,” and “motion,” and also 
the relationship “lies between” for three points on a line were defined in terms of 
Euclidean geometry. Thus we would arrive at a contradiction to Euclidean geometry 
itself. 


Let us focus attention on this fine logical construction: we construct objects in 
some domain that satisfy a certain system of axioms, and thus we prove the con- 
sistency of this system if the consistency of the domain from which the necessary 
objects are taken has been accepted. Today, one says that a model of this axiom 
system has thereby been constructed in another domain. In particular, we earlier 
constructed a model of hyperbolic geometry in the theory of vector spaces. Only by 
constructing such a model was the question of the provability of the “fifth postulate” 
decided in mathematics. 

In conclusion, it is of interest to dwell a bit on the history of this question. In- 
dependent of Lobachevsky, a number of researchers came to the conclusion that a 
negation of the “fifth postulate” leads to a meaningful and consistent branch of math- 
ematics, a “new geometry,” eventually given the designation “non-Euclidean geom- 
etry.” There is no question here of priority. All the researchers clearly worked inde- 
pendently of one another (Gauss’s correspondence from the 1820s, Lobachevsky’s 
publication of 1829, and Janos Bolyai’s of 1832). Most of these who became known 
later were amateurs, not professional mathematicians. But there were some excep- 
tions: outside of Lobachevsky, there was the greatest mathematician of that epoch— 
Gauss. The majority of such researchers known to us who clearly arrived at the 
same conclusions independently became known precisely because of their corre- 
spondence with Gauss, which was published along with other of Gauss’s papers 
after his death. It is clear from these publications that in his youth, Gauss had at- 
tempted to prove the fifth postulate, but later concluded that there existed a meaning- 
ful and consistent geometry that did not include this postulate. In his letters, Gauss 
discussed the similar views of his correspondents with great interest. 

He clearly received the work of Lobachevsky with sympathetic understand- 
ing when it began to appear in translation, and on Gauss’s recommendation, 
Lobachevsky was elected a member of the Gottingen Academy of Sciences. 

In one of Gauss’s diaries can be seen the name Nikolai Ivanovich Lobachevsky, 
written in Cyrillic letters: 


HUKOJAU UBAHOBUY FTOBAUEBCKUU 


But it is surprising that Gauss himself, throughout his entire life, published not a 
line on this subject. Why was that? The usual explanation is that Gauss was afraid 
of not being understood. Indeed, in one letter in which he touched on the question 
of the “fifth postulate” and non-Euclidean geometry, he wrote, “since I fear the 
clamor of the Boeotians.” But it seems that this cannot be the full explanation of 
his mysterious silence. In his other works, Gauss did not fear being misunderstood 


452 12. Hyperbolic Geometry 


by his readers.” It is possible, however, that there is another explanation for Gauss’s 
silence. He was one of the few who realized that however many interesting theorems 
of non-Euclidean geometry might be deduced, this would prove nothing definitively; 
there would always remain the theoretical possibility that future derivations would 
yield a contradictory assertion. And perhaps Gauss understood (or sensed) that at 
the time (first half of the nineteenth century), the mathematical concepts had not yet 
been developed to pose and solve this question rigorously. 

Apparently, Lobachevsky was among the small number of mathematicians in 
addition to Gauss who understood this. For him, as with Gauss, there stood the 
question of “incomprehensibility.” First of all, for Lobachevsky, there was the lack 
of comprehension among Russian mathematicians, especially analysts, who totally 
failed to accept his work. In any case, he constantly attempted to find a consistent 
foundation for his geometry. For example, he discovered its striking parallel with 
spherical geometry and expressed the idea that it was the “geometry of the sphere 
with imaginary radius.” His geometry could indeed have been realized in the form 
of some other model if the very notion of model had been sufficiently developed at 
that time. 

Beyond this (as noted by the French mathematician André Weil (1906-1998)), 
here we have the simplest case of duality between compact and noncompact sym- 
metric spaces, discovered in the twentieth century by Elie Cartan. 

Moreover, Lobachevsky proved that in three-dimensional hyperbolic space, there 
is a surface (called today a horosphere) such that if we consider only the set of its 
points and take as lines the curves of a specific type lying on it (called today horo- 
cycles), then all the axioms of Euclidean geometry are satisfied. From this it follows 
that if hyperbolic geometry is consistent, then Euclidean geometry is also consistent. 
Even if we accept the hypothesis that the “fifth postulate” does not hold, Euclidean 
geometry is still realized on the horosphere. Thus in principle, Lobachevsky came 
very close to the concept of a model. But he did not succeed in constructing a model 
of hyperbolic geometry in the framework of Euclidean geometry. Such a construc- 
tion was not easily granted to mathematicians. 

The following paragraph offers only a hint, and not a precise formulation, of the 
corresponding assertions. 

First, in 1868, Eugenio Beltrami (1835-1899) constructed in three-dimensional 
Euclidean space a certain surface called a pseudosphere or Beltrami surface, whose 
Gaussian curvature (see the definition on p. 265) at every point is the same nega- 
tive number. Hyperbolic geometry can be realized on the pseudosphere, where the 
role of lines is played by so-called geodesic lines.4 However, here we are talking 
about only a piece of the pseudosphere and a piece of the hyperbolic plane. Here the 
posing of the question must be radically changed, since the majority of the axioms 
that we have given assume (as in, for example, Euclidean geometry) the possibility 


3For example, his first published book, Disquisitiones Arithmeticae, was considered for a long time 
to be quite inaccessible. 


4More about this can be found, for example, in the book A Course of Differential Geometry and 
Topology, by A. Mishchenko and A. Fomenko (Mir, 1988). 


12.2 The Axioms of Plane Geometry* 453 


of continuing lines to infinity. The coincidence of two bounded pieces is under- 
stood in the sense of the coincidence of the measures of lengths and angles, about 
which, in the case of hyperbolic geometry, more will be said in the following sec- 
tion. Moreover, Hilbert later proved that the hyperbolic plane cannot in this sense 
be completely identified with any surface in three-dimensional space (much later it 
was proved that it is possible for some surface in five-dimensional space). 

The model of hyperbolic geometry that we gave for the proof of Theorem 12.8 
was constructed by Felix Klein (1849-1925) in 1870. The history of its appearance 
was also astounding. Formally speaking, this model was constructed in 1859 by the 
English mathematician Arthur Cayley (1821-1895). But he considered it only as a 
certain construction in projective geometry and apparently did not notice the con- 
nection with non-Euclidean geometry. In 1869, the young (twenty-year-old) Klein 
became acquainted with his work. He recalled that in 1870, he gave a talk on the 
work of Cayley at the seminar of the famous mathematician Weierstrass, and, as he 
writes, “I finished with a question whether there might exist a connection between 
the ideas of Cayley and Lobachevsky. I was given the answer that these two sys- 
tems were conceptually widely separated.” As Klein puts it, “I allowed myself to 
be convinced by these objections and put aside this already mature idea.” However, 
in 1871, he returned to this idea, formulated it mathematically, and published it. 
But then his work was not understood by many. In particular, Cayley himself was 
convinced as long as he lived that there was some logical error involved. Only after 
several years were these ideas fully understood by mathematicians. 

Of course, one can ask not only about the existence of Euclidean and hyperbolic 
geometries, but also about a number of different (in a certain sense) geometries. 
Here we shall formulate only the results that are relevant to the current discussion.° 

First of all, we must give a precise sense to what we mean by “different” or 
“identical” geometries. This can be done with the help of the notion of isomorphism 
of geometries, which is analogous to the notion of isomorphism of vector spaces 
introduced earlier. Within the framework of a system of axioms used in this section, 
this can be done as follows. Let /7 and JT’ be two planes satisfying the axioms of 
groups I-IV, and let G and G’ be sets of motions of the respective planes. Mappings 
y: II > IT' and Ww: G > G’ define an isomorphism (¢, w) of these geometries if 
the following conditions are satisfied: 


(1) Both mappings ¢ and y are bijections. 

(2) The mapping ¢ takes every line / in the plane I7 to some line g(/) in the 
plane 7’. 

(3) The mapping ¢ preserves the relationship “lies between.” This means that if 
points A, B, and C lie on the line /, with C lying between A and B, then the 
point y(C) lies between g(A) and g(B) on the line g(/). 

(4) The mappings ¢ and y agree in the following sense: for every motion f € G, 
its image y(f) is equal to yf y—!. This means that for every point A € J7, the 
equality (W(f))(g(A)) = g(f(A)) holds. 


5Their proofs are given in every course in higher geometry, for example, in the book Higher Ge- 
ometry, by N.V. Efimov, mentioned earlier. 


454 12 Hyperbolic Geometry 


(5) For every motion f € G, the equality y(f~!) = w(f)~! holds, and for every 
pair of motions f), f2 € G, we have w(fi f2) = W(fi) ¥ (fa). 


Let us note that some of these conditions can be derived from the others, but for 
brevity, we shall not do this. 

We shall consider geometries up to isomorphism as just described, that is, we 
shall consider two geometries the same if there exists an isomorphism between 
them. In particular, geometries with respective axioms | and 1’ in group V are 
clearly not isomorphic to each other, that is, they are two different geometries. From 
this point of view, geometries (in the plane) satisfying axioms | and 1’ are funda- 
mentally different from each other. Namely, it has been proved that all geometries 
satisfying axiom 1| in group V are isomorphic.° But geometries that satisfy axiom 
1’ in group V are characterized up to isomorphism by a certain real number c called 
their curvature. This number is usually assumed to be negative, and then it can take 
on any value c < 0. 

Klein suggested that Euclidean geometry can be viewed as the limiting case of 
hyperbolic geometry as the curvature c approaches zero.’ As Klein further observed, 
if axiom | (of Euclid) is satisfied in our world, then we shall never know it. Since 
every physical measurement is taken with a certain degree of error, to establish the 
precise equality c = 0 is impossible, for there always remains the possibility that the 
number c is less than zero, but it is so small in absolute value that it lies beyond the 
limits of our measurements. 


12.3 Some Formulas of Hyperbolic Geometry* 


First of all, we shall define the distance between points in the hyperbolic plane using 
its definition as the set of points of the projective plane P(L) corresponding to the 
lines of the three-dimensional pseudo-Euclidean space L lying within the light cone 
and its interpretation as the set of points on the unit circle U in the affine Euclidean 
plane E; see Sect. 12.1. 

The meaning of the notion of distance is that it should be preserved under mo- 
tions of the hyperbolic plane. But we have defined a motion as a certain special 
projective transformation P(A) of the projective plane P(L). Theorem 9.16 shows 
that in general, it is impossible to associate a number that does not change under 
an arbitrary projective transformation not only with two points, but even with three 
points of the projective line. But we shall use the fact that motions of the hyperbolic 
plane are not arbitrary projective transformations P(L), but only those that take the 
light cone in the space L into itself. 

Namely, to two arbitrary points A and B correspond the lines (a) and (b), lying 
inside the light cone. We shall show that they determine two additional points, P 


Of course, here we are assuming that they all satisfy the axioms of groups LIV. 
7Felix Klein. Nicht-Euklidische Geometrie, Gottingen, 1893. Reprinted by AMS Chelsea, 2000. 


12.3 Some Formulas of Hyperbolic Geometry* 455 


Fig. 12.5 The segment [PQ] P A B Q 


and Q, that correspond to lines lying on the light cone. But four points of a projec- 
tive space lying on a line already determine a number that does not change under 
arbitrary projective transformations, namely their cross ratio (defined in Sect. 9.3). 
We shall use this number for defining the distance between points A and B. This 
definition has the special feature that it uses points corresponding to lines lying on 
the light cone (P and Q), which are thus not points of the hyperbolic plane. 

We shall assume that the points A and B are distinct (if they coincide, then the 
distance between them is zero by definition). This means that the vectors a and b are 
linearly independent. It is obvious that then a unique projective line / passes through 
these points; it corresponds to the plane L’ = (a, b). The line / determines a line /’ 
in the affine Euclidean space E, depicted in Figs. 12.1 and 12.2. Since the line /’ 
contains the points A and B, which lie inside the circle U, it intersects its boundary 
in two points, which we shall take as P and Q. This was in fact already proved in 
Sect. 12.1, but we shall now repeat the corresponding argument. 

The points of / are the lines (x) consisting of all vectors proportional to the 

— — = 
vectors x = OA + ftAB, where t¢ is an arbitrary real number. Here the vector OA 
equals a, and the vector AB = c belongs to the subspace Eo if we assume that the 
points A, B and the line / lie in the affine space E. This means that x =a + te, 
where the vector c can be taken as fixed, and the number f¢ as variable. Points x at 
the intersection of the line /’ with the light cone V C L are given by the condition 
(x?) = 0, that is, 


(a +tc)*) = (a*) + 2(a, c)t + (c7)t? =0. (12.18) 
( 


We know that (a?) < 0, and the vector ¢ belongs to Eg. Since Eg is a Euclidean 
space and the points A and B are distinct, it follows that (c”) > 0. From this it 
follows that the quadratic equation (12.18) in the unknown ¢ has two real roots fy 
and f2 of opposite signs. Suppose for the sake of definiteness that ft) < f2. Then 
for t) < t < ty, the value of ((a + te)”) is negative, and all points of the line /’ 
corresponding to the values ¢ in this interval belong to L. We see that the line / 
intersects the light cone V in two points corresponding to the values t = t, and 
t = fo, while the values t; < t < ft are associated with the points of the line L; 
(that is, one-dimensional hyperbolic space) passing through A and B. Thus the line 
L; coincides with the line segment / C E whose endpoints are P and Q, which 
correspond to the values f = f; and t = fg; see Fig. 12.5. 

It is clear that point A is contained in the interval (P, Q). Applying the same 
argument to the point B, we obtain that the point B is also in the interval (P, Q). 

Let us label the points P and Q in such a way that P will denote the endpoint of 
the interval (P, Q) that is closer (in the sense of Euclidean distance) to the point A, 
and by Q the endpoint that is closer to B, as depicted in Fig. 12.5. 


456 12 Hyperbolic Geometry 


Now it is possible to give a definition of the distance between points A and B, 
which we shall denote by r(A, B): 


r(A, B) = logDV(A, B, Q, P), (12.19) 


where DV(A, B, Q, P) is the cross ratio (see p. 337). Let us note that in the defi- 
nition (12.19), we have not indicated the base of the logarithm. We could take any 
base greater than 1, since a change in base results simply in multiplying all distances 
by some fixed positive constant. But in any case, the length of a segment AB can be 
defined only up to a multiplicative factor that corresponds to the arbitrariness in the 
selection of a unit length on a line. 

We shall explain a bit later why the logarithm appears in definition (12.19). The 
reason for using the cross ratio is explained by the following theorem. 


Theorem 12.9 The distance r(A, B) does not change under any motion f of the 
hyperbolic plane, that is, r( f(A), f(B)) =r(A, B). 


Proof The assertion of the theorem follows at once from the fact that a motion f of 
the hyperbolic plane is determined by a certain projective transformation P(A). This 
transformation P(A) carries the line /’ passing through points A and B to the line 
passing through the points P(.A)(A) and P(.A)(B). This means that the transforma- 
tion takes the points P and Q, the intersection of the line /’ with the boundary of the 
disk U, to the points P’ and Q’, the intersection of the line P(.4)(1’) with this bound- 
ary. That is, P’ = P(.A)(P) and Q’ = P(.A)(Q), or conversely, Q’ = P(.A)(P) and 
P' =P(A)(Q). Moreover, the transformation P(.A) preserves the cross ratio of four 
points on a line (Theorem 9.17). 


To explain the role of the cross ratio, we jumped a bit ahead and skipped the 
verification that the argument of the logarithm in formula (12.19) was a number 
greater than | and also that in the definition of r(A, B), all the conditions entering 
into the definition of a distance (p. xvii) were satisfied. We now return to this. 

Let us assume that the points P, A, B, Q are arranged in the order shown in 
Fig. 12.5. For the cross product, we may use formula (9.28), 

on _1AQ)-1PBI 
(A, B,Q,P)= > 1, (12.20) 
|BQ|-|PA| 
since clearly, |AQ| > |BQ| and |PB| > |PA|. Therefore, the argument of the loga- 
rithm in formula (12.19) is a number greater than 1, and so the logarithm is a positive 
real number. Therefore, r(A, B) > 0 for all pairs of distinct points A and B. 

Let us note that it would be possible to make do without the order of the points P 
and Q that we chose. For this, it would be sufficient to verify (this follows directly 
from the definition of the cross ratio) that under a transposition of the points P and 
Q, the cross ratio d is converted into 1/d. Thus the logarithm (12.19) that gives the 
distance is defined up to sign, and we can define the distance as the absolute value. 


12.3 Some Formulas of Hyperbolic Geometry* 457 


If we interchange the positions of A and B, then the points P and Q defined in 
the agreed-upon way also exchange places. It is easy to verify that the cross ratio 
determines a distance according to formula (12.19) that will not change. In other 
words, we have the equality 


r(B, A) =r(A, B). (12.21) 


For any third point C collinear with A and B and lying between them, the con- 
dition 
r(A, B)=r(A,C)+r(C, B) (12.22) 


is satisfied. It follows from the fact that (in the notation we have adopted) 


pv, B, 9, P) = AP IBPl _ pvia.c. 9, P)-DV(C.B.Q.P), (12.23) 
|BQ|-|AP| 
since 
|AQ|-|CP| ICQ|-|BP| 
pvi4,c,o, p= SE | pweg.g, pa ee | os 
( af) ICQ|-|AP| ( 2.) |BO|-|CP| ( ) 


For the verification, it remains only to substitute the expressions (12.24) into for- 
mula (12.23). 

In any sufficiently complete course in geometry, it is proved without using the 
parallel postulate (that is, in the framework of “absolute geometry”’) that there exists 
a function r(A, B) of a pair of points A and B that satisfies the following condi- 
tions: 


1. r(A, B)>O0if AFB, andr(A, B)=O0if A= B; 

2. r(B, A)=r(A, B) for all points A and B; 

3. r(A, B)=r(A,C)+r(C, B) for every point C collinear with A and B and lying 
between them; 


and most importantly, 
4. the function r(A, B) is invariant under motions. 


Using the definitions given at the beginning of this book, we may say in short that 
r(A, B) is a metric on the set of points in the plane under consideration and motions 
are isometries of this metric space. 

Such a function is unique if we fix two distinct points Ag and Bo for which 
r(Ao, Bo) = 1 (‘unit of measurement’). This means that these assertions also hold 
in hyperbolic geometry, and formula (12.19) defines this distance (and the base of 
the logarithm in (12.19) is chosen in correspondence with the chosen “unit of mea- 
surement”’). 

Every triple of points A, B, C satisfies the condition 


r(A, B) <r(A,C)+r(B,C). (12.25) 


458 12 Hyperbolic Geometry 


Fig. 12.6 The triangle 
inequality 


pW 


This is the familiar triangle inequality, and in many courses in geometry, it is derived 
without use of the parallel postulate, that is, as a theorem of “‘absolute geometry.” 
Thus inequality (12.25) holds as well in hyperbolic geometry. But we shall now give 
a direct (that is, resting directly on formula (12.19)) proof of this due to Hilbert. 

Let us recall that in the model that we have considered, the points of the hyper- 
bolic plane are points of the disk K in the Euclidean plane L, and the lines of the 
hyperbolic plane are the line segments of the plane L that lie inside the disk K. 

Let us consider three points A, B, C in the disk K. We shall denote the points 
of intersection of a line passing through A and B with the boundary of the disk K 
by P and Q, and the analogous points for the line passing through A and C will be 
denoted by U and V, and for the line passing through B and C, by S and T. See 
Fig. 12.6. 

Let us denote the point of intersection of the line AB and the line SU by X, and 
the point of intersection of the line AB and the line TV by Y. Then we have the 
inequality 


DV(A, B, Y, X) > DV(A, B, Q, P). (12.26) 
Indeed, the left-hand side of (12.26) is equal by definition to 


|AY| -|BX| 


DV(A, B, Y,X) = 
|BY|-|AX| 


(12.27) 


and its right-hand side is given by the relationship (12.20). Therefore, inequality 
(12.26) follows from the fact that 


|AY| _ |AQ| |BX| _ |BP| 


> an > ; (12.28) 
|BY| |BQ| |AX| |AP| 


Let us prove the first of inequalities (12.28). Let us define a = |AB|, t; = |BQ|, 
and f2 = |BY|. Then we obviously obtain the expressions |A Q|/|BQ| = (a+t)/t 
and |AY|/|BY| = (a + t2)/t2. For a > 0, the function (a + t)/t in the variable tf 
decreases monotonically with increasing t, and therefore, from the fact that fo < t 
(which is obvious from Fig. 12.6) follows the first of inequalities (12.28). Defining 


12.3 Some Formulas of Hyperbolic Geometry* 459 


a = |AB|, t} = |AX|, and t2 = |AP|, using completely analogous arguments, we 
may prove the second inequality of (12.28). 

Let us denote the intersection of the lines SU and TV by W, let us connect this 
line with the point C, and let us denote the point of intersection of the line thus ob- 
tained with the line AB by D. Then the points X, A, D, Y and points U, A, C, V are 
obtained from each other by a perspective mapping just as was done for the points 
Y, B, D, X and T, B, C, S. Then in view of Theorem 9.19, we have the relationships 


|AY|-|DX|__ |AV|-|CU| |BX|-|DY| _ |BS|-|CT| 
|DY|-|AX| |CV|-|AU|’ |DX|-|AY| |CS|-|BT| 


Multiplying these equalities, we have 


|AY|-|BX| |AV|-|CU| |BS|-|CT| 
|BY|-|AX|  |CV|-|AU| |CS|-|BT|’ 


Taking the logarithm of the last equality, and taking into account (12.27) for 
DV(A, B, Y, X), the analogous expression for DV(A, C, U, V) and that for DV(B, 
C, S,T), and definition (12.19), we obtain the relationship 


log DV(A, B, Y,X)=r(A,C)+r(B,C), 


from which, taking into account (12.26), we obtain the required inequality (12.25). 

Let us note that if the point B approaches Q along the segment PQ (see 
Fig. 12.6), then |BQ| approaches zero, and consequently, r(A, B) approaches in- 
finity. This means that despite that fact that the line passing through the points A 
and B is represented in our figure by a segment of finite length, its length in the 
hyperbolic plane in infinite. 

The measurement of angles is similar to that of line segments. As we know, an 
arbitrary point O on a line / partitions it into two half-lines. One half-line together 
with the point O is called a ray h with center O. Two rays h and k with common 
center O are called an angle; we shall assume that the ray h is obtained from k by a 
counterclockwise rotation. This angle is denoted by 7(h, k). 

In “absolute geometry,” it is proved that for each angle with vertex at the point 
O, there is a unique real number 4(h, k) satisfying the following four conditions: 


1. £(h,k) > O forall hFk; 

2. X(k,h) = £(h, k); 

3. if f is a motion and f(h) =h’, f(k) =k’, and O' = f(O) is the vertex of the 
angle Z(h', k’), then £(h', k’) = £(h, k). 


To formulate the fourth property, we must introduce some additional concepts. 
Let the rays h and k forming the angle 7(h, k) lie on lines /; and /2. The points in 
the plane lying on the same side of the line /; as the points of the half-line k and on 
the same side of the line /y as the points of the half-line h are called interior points 
of the angle 7(h,k). A ray / with the same center O as the rays h and k is said to 
be an interior ray of the angle 7(h, k) if it consists of interior points of this angle. 

We can now formulate the last property: 


460 12. Hyperbolic Geometry 


4. If / is an interior ray of the angle Z7(h, k), then £(h,1) + <(,k) = £(h,k). 


As in the case of distance between points, the measure of an angle is defined 
uniquely if we choose a “unit measurement,” that is, if we take a particular angle 
Z(ho, ko) as the “unit angle measure.” 

We shall point out an explicit method of defining the measure of angles in hyper- 
bolic geometry that is realized in the disk K given by the relationship x7 + y? < 1 
in the Euclidean plane L with coordinates x, y. 

Let Z(h’, k’) be the angle with center at the point O’, and let f be an arbitrary 
motion taking the point O’ to the center O of the disk K. From the definitions, it is 
obvious that f takes the half-lines h’ and k’ to some half-lines h and k with center at 
the point O. Let us set the measure of £(h’, k’) equal to the Euclidean angle between 
the half-lines and k. The main difficulty in this definition is that it uses a motion 
jf, and therefore, we must prove that the measure of the angle thus obtained does not 
depend on the choice of the motion f (of course, with the condition f(O’) = O). 

Let g be another motion with the same property that g(O’) = O. Then g~!(O) = 
O’, and this means that fg~!(O) = O, that is, the motion fg~! leaves the point O 
fixed. As we saw in Sect. 12.1 (p. 438), a motion possessing such a property is 
of type (a), which means that fg—! corresponds to an orthogonal transformation 
of the Euclidean plane L; that is, the angle Z(h, k) is taken to the angle Z(h,k) 
via the orthogonal transformation fg~!, which preserves the inner product in L 
and therefore does not change the measure of angles. This proves the correctness 
of the definition of angle measure that we have introduced. Equally easy are the 
verifications of properties 1-3. 

The best-known property of angles in hyperbolic geometry is the following. 


Theorem 12.10 Jn hyperbolic geometry, the sum of the angles of a triangle is less 
than two right angles, that is, less than 1. 


Since we are talking about a triangle, we can restrict our attention to the plane 
in which this triangle lies and assume that we are working in the hyperbolic plane. 
The key result is related to the fact that an angle 7(h,k) in hyperbolic geometry 
also determines a Euclidean angle, and we may then compare the measures of these 
angles. We shall denote the measure of the angle 7(h, k) in hyperbolic geometry, as 
before, by “(h, k), and its Euclidean measure by &, (h, k). 


Lemma 12.11 /f one ray of the angle Z(h,k) (for example, h) passes through the 
center O of the disk K, then the measure of this angle in the sense of hyperbolic 
geometry is less than the Euclidean measure, that is, 


4£(h,k) < &,(h,k). (12.29) 


First, we shall show how easily Theorem 12.10 follows from the lemma, and then 
we shall prove the lemma itself. 


Proof of Theorem 12.10 Let us denote the vertices of the triangle in question by 
A, B,C. Since the measure of an angle is invariant under a motion, it follows by 


12.3 Some Formulas of Hyperbolic Geometry* 461 


Fig. 12.7 A triangle in the 
hyperbolic plane 


Theorem 12.5 that we can choose a motion taking one of the vertices of the triangle 
(for example, A) to the center O of the disk K. Let the vertices B and C be taken 
to B’ and C’. See Fig. 12.7. 

It suffices to prove the theorem for the triangle OB’C’. But for the angle 
ZB’ OC’, we have by definition the equality 


AR OC’ =2£,B 0C’, 
and for the two remaining angles, we have by the lemma, the inequalities 
LOB'C’ <&,OB'C', LOC'B' < &,,0C'B'. 
Adding, we obtain for the sum of the angles of triangle O B’C’ the inequality 
£B'OC'+ LOB'C'+ 4OC'B' < £4, B'OC'+ 4, OB'C'+ 4, 0C'B’. 


By a familiar theorem of Euclidean geometry, the sum on the right-hand side is 
equal to zr, and this proves Theorem 12.10. 


Proof of Lemma 12.11 We shall have to use the explicit form of the definition of the 
measure of an angle. Let the ray h of the angle Z7(h,k) pass through the point O. 
To describe the disk K, we shall introduce a Euclidean rectangular system of co- 
ordinates (x, y) and assume that the vertex of angle 7(h, k) is located at the point 
O’ with coordinates (A, 0), where A 4 0. For this, it is necessary to execute a ro- 
tation about the center of the disk in such a way that the point O’ passes through 
some point of the line y = 0 and use the fact that angles are invariant under such a 
rotation. 

Now we must write down explicitly a motion f of the hyperbolic plane taking the 
point O to O’. We already constructed such a motion in Sect. 12.1; see Example 12.4 
on p. 439. There, we proved that there exists a motion of the hyperbolic plane that 
takes the point with coordinates (x, y) to the point with coordinates (x’, y’), given 
by the relationships 


, ax+b j y 
a 


2 2 
= ; ; —b°=1. 12.30 
bx +a . bx +a . ( ) 


462 12. Hyperbolic Geometry 


Fig. 12.8 Angles in the 
hyperbolic plane 


If we want the point O’ = (A, 0) to be sent to the origin O = (0,0), then we 
should set aA + b = 0, or equivalently, 7 = —b/a. It is not difficult to verify that it 
is possible to represent any number A in this form. Thus the mapping (12.30) has 
the form 
pT us —_ y 

~T—-ax > afl —-a) 


Let the ray k intersect the y-axis at the point A with coordinates (0, jz); see Fig. 12.8. 
(We note that this point is not required to be in the disk K.) 

From formula (12.31), it is clear that our transformation takes a vertical line x = c 
to a vertical line x = c’. The point O is taken to the point O = (—A, 0), the point 
A = (0, 2) to the point A=(-A,u /a), and the vertical line OA to the vertical line 
OA. By the definition of an angle in hyperbolic geometry, 4OO'A = 4, OOA. 
The tangents of the Euclidean angles are known to us: 


(12.31) 


Xx 


UL aoa OA _ fe 

tan(<,00'A) =-—, tan(<,, OO A) = — = —; 

an( E ) 1 an(4, ) " . 
see Fig. 12.8. Since a? = 1+b?, we have a > 1, and we see that in Euclidean geom- 
etry, we have the inequality tan(, OO A) < tan(4,,00"A). The tangent is a strictly 
increasing function, and therefore we have the inequality 4, OOA < £,00'A for 
angles that are Euclidean. But £0 O'A = 4, OOA, and this means that £OO'A < 
4£,00'A. 


It is of interest to compare Theorem 12.10 with the analogous result for spheri- 
cal geometry. We have not yet encountered spherical geometry in this course, even 
though it was developed in detail much earlier than hyperbolic geometry, indeed 
in antiquity. In spherical geometry, the role of lines in played by great circles on 
the sphere, that is, sections of the sphere obtained by all possible planes passing 
through its center. The analogy between great circles on the sphere and lines in the 
plane consists in the fact that the arc of the great circle joining points A and B has 
length no greater than that of any other curve on the sphere with endpoints A and B. 
This arc length of a great circle (which, of course, depends also on the radius R of 
the sphere) is called the distance on the sphere from point A to point B. 


12.3. Some Formulas of Hyperbolic Geometry* 463 


Fig. 12.9 A triangle on the 
sphere 


The measurement of lengths and angles on the sphere can generally be defined 
in exactly the same way as in Euclidean or hyperbolic geometry. Here the angle 
between two “lines” (that is, great circles) is equal to the value of the dihedral angle 
formed by the planes passing through these great circles. We have the following 
result. 


Theorem 12.12 The sum of the angles of a triangle on the sphere is greater than 
two right angles, that is, greater than 1. 


Proof Let there be given a triangle with vertices A, B,C on a sphere of radius R. 
Let us draw all the great circles whose arcs are the sides AB, AC, and BC of triangle 
ABC. See Fig. 12.9. 

Let us denote by 24 the part of the sphere enclosed between the great circle 
passing through the points A, B and the great circle passing through A, C. We in- 
troduce the analogous notation 2g and 2c. Let us denote by A the measure of the 
dihedral angle BAC and similarly for B and C. Then the assertion of the theorem 
is equivalent to asserting that A+B+Con. 

But it is easy to see that the area of 2’, is the same fraction of the area of the 
sphere as 2A is of 2x. Since the area of the sphere is equal to 47 R’, it follows that 
the area of 4 is equal to 

an R?. 24 _ 4R?Z. 
20 
Similarly, we obtain expressions for the areas 2’p and 2’c; they are equal to 4R2B 
and 4R2C respectively. Let us now observe that the regions 24, X’g, and 2c to- 
gether cover the entire sphere. Here each point of the sphere not part of triangle 
ABC or of triangle A’ B’C’ symmetric to it on the sphere belongs to only one of 
the regions X'4, 2g, and 2c, and every point in triangle ABC or the symmetric 
triangle A’B’C’ is contained in all three regions. We therefore have 


AR?(A + B+ C) = 4 R* + 2Spanc t+ 2Saatpic’ = 40 R* + 48a anc. 


464 12 Hyperbolic Geometry 


From this we obtain the relationship 


, (12.32) 


from which it follows that A + B + Cyn. 


Formula (12.32) gives an example of a series of relationships systematically de- 
veloped by Lobachevsky: if we were to assume that R? < 0 (that is, R is a purely 
imaginary number), then clearly, we would obtain from (12.32) the inequality 


A+B+C <n, 


which is Theorem 12.10 of hyperbolic geometry. This is why Lobachevsky con- 
sidered that his geometry is realized “on a sphere of imaginary radius.” However, 
the analogy between theorems obtained on the basis of the negation of the “fifth 
postulate” and formulas obtained from those of spherical geometry by replacing R7 
with a negative number had been already noted by many mathematicians working 
on these questions (some even as early as the eighteenth century). 

The reader should be warned that spherical geometry is entirely inconsistent with 
the system of axioms that we considered in Sect. 12.2. That system does not in- 
clude one of the fundamental axioms of relationship: several different lines can pass 
through two distinct points. Indeed, infinitely many great circles pass through any 
two antipodal points on the sphere. In connection with this, Riemann proposed an- 
other geometry less radically different from Euclidean geometry. We shall describe 
it in the two-dimensional case. 

For this, we shall use a description of the projective plane JT as the collection of 
all lines in three-dimensional space passing through some point O. Let us consider 
the sphere S with center at O. Every point P € S together with the center O of 
the sphere determines a line /, that is, some point Q of the projective plane 7. The 
association P — Q defines a mapping of the sphere S to the projective plane IT 
whereby great circles on the sphere are taken precisely to lines of J7. Clearly, exactly 
two points of the sphere are mapped to a single point Q € JT: together with the point 
P, there is also the second point of the intersection of the line / with the sphere, that 
is, the antipodal point P’. But Euclidean motions taking the sphere S into itself (we 
might call them motions of spherical geometry) give certain transformations defined 
on the projective plane /7 and satisfying the axioms of motion. It is possible as well 
to transfer the measures of lengths and angles from the sphere S to the projective 
plane /7. Then we have the analogue of Theorem 12.12 from spherical geometry. 

This branch of geometry is called elliptic geometry.® In elliptic geometry, every 
pair of lines intersect, since such is the case in the projective plane. Thus there are no 
parallel lines. However, in “absolute geometry,” it is proved that there exists at least 


8Elliptic geometry is sometimes called Riemannian geometry, but that term is usually reserved for 
the branch of differential geometry that studies Riemannian manifolds. 


12.3 Some Formulas of Hyperbolic Geometry* 465 


Fig. 12.10 Elliptic geometry 


one line passing through any given point A not lying on a given line / that is parallel 
to 7. This means that in elliptic geometry, not all the axioms of “absolute geometry” 
are satisfied. The reason for this is easily ascertained: in elliptic geometry, there 
in no natural concept of “lying between.” Indeed, a great circle of the sphere S is 
mapped to a line / of the projective plane /7, where two antipodal points of the 
sphere (A and A’, B and B’, C and C’, and so on) are taken to one point of the 
plane JT. See Fig. 12.10. It is clear from the figure that in elliptic geometry, we may 
assume equally well that the point C does or does not lie between A and B. 

Nevertheless, elliptic geometry possesses the property of “free mobility.” More- 
over, one can prove (Helmholtz—Lie theorem) that among all geometries (assuming 
some rigorous definition of this term), only three of them—Euclidean, hyperbolic, 
and elliptic—possess this property. 


Chapter 13 
Groups, Rings, and Modules 


13.1 Groups and Homomorphisms 


The concept of a group is defined axiomatically, analogously to the notions of vec- 
tor, inner product, and affine space. Such an abstract definition is justified by the 
wealth of examples of groups throughout all of mathematics. 


Definition 13.1 A group is a set G on which is defined an operation that assigns 
to each pair of elements of this set some third element; that is, there is defined 
a mapping G x G — G. The element associated with the elements g; and gz by 
this rule is called their product and is denoted by g; - gz or simply g 1g. For this 
mapping, the following conditions must also be satisfied: 


(1) There exists an element e € G such that for every g € G, we have the relation- 
ships eg = g and ge = g. This element is called the identity.! 

(2) For each element g € G, there exist an element g’ € G such that gg’ = e and an 
element g” € G such that g”g = e. The element g’ is called a right inverse, and 
the element g” is called a left inverse of the element g. 

(3) For every triple of elements g1, g2, g3 € G, the following relationship holds: 


(2182)83 = 81(8283). (13.1) 


This last property is called associativity, and it is a property that we have already 
met repeatedly, for example in connection with the composition of mappings and 
matrix multiplication, and also in the construction of the exterior algebra. We con- 
sidered the associative property in its most general form on p. xv, where we proved 
that equality (13.1) makes it possible to define the product of an arbitrary number 
of factors g1g2--- gg, which then depends only on the order of the factors and not 


'The identity element of a group is unique. Indeed, if there existed another identity element e’ € G, 

then by definition, we would have the equalities ee’ = e’ and ee’ = e, from which it follows that 
/ 

e=e'. 


LR. Shafarevich, A.O. Remizov, Linear Algebra and Geometry, 467 
DOI 10.1007/978-3-642-30994-6_13, © Springer-Verlag Berlin Heidelberg 2013 


468 13 Groups, Rings, and Modules 


on the arrangement of parentheses in the product. The reasoning given there applies, 
obviously, to every group. 

The condition of associativity has other important consequences. From it, de- 
rives, for example, the fact that if g’ is a right inverse of g, and g” is a left inverse, 
then 


a (gg’) = ge = 2, 2" (gg’) = (¢"9)g" = eg’ = @, 
from which it follows that g’ = g’”. Thus the left and right inverses of any given 
element g € G coincide. This unique element g’ = g” is called simply the inverse 


of g and is denoted by g~!. 


Definition 13.2 If the number of elements belonging to a group G is finite, then the 
group G is called a finite group, and otherwise, it is called an infinite group. The 
number of distinct elements in a finite group G is called its order and is denoted by 
|G|. 


Let M be an arbitrary set, and let us consider the collection of all bijective map- 
pings between M and itself. Such mappings are also called transformations of the 
set M. In the introductory section of this book, we defined the operation of com- 
position (that is, the sequential application) of arbitrary mappings of arbitrary sets 
(p. xiv). It follows from the properties proved there that the collection of all trans- 
formations of a set M together with the operation of composition forms a group, 
where the inverse of each transformation f : M— M is given by the inverse map- 
ping f~!: M > M, while the identity is obviously given by the identity mapping 
on the set M. Such groups are called transformation groups, and it is with these that 
the majority of applications of groups are associated. 

It is sometimes necessary to consider not all the transformations of a set, but to 
limit our consideration to some subset. The situation that thus arises can be formu- 
lated conveniently as follows: 


Definition 13.3 A subset G’ C G of elements of a group G is called a subgroup of 
G if the following conditions are satisfied: 


(a) For every pair of elements g, g2 € G’, their product g1g is again in G’. 
(b) G’ contains the identity element e. 
(c) For every g € G’, its inverse g~! is again in G’. 


It is obvious that a subgroup G’ is itself a group. Thus from the group of all 
transformations, we obtain a set of examples (indeed, the majority of examples of 
groups). Let us enumerate some that are met most frequently. 


Example 13.4 The following sets are groups under the operation of composition of 
mappings. 


1. the set of nonsingular linear transformations of a vector space; 
2. the set of orthogonal transformations of a Euclidean space; 


13.1 Groups and Homomorphisms 469 


. the set of proper orthogonal transformations of a Euclidean space; 
. the set of Lorentz transformations of a pseudo-Euclidean space; 

. the set of nonsingular affine transformations of an affine space; 

. the set of projective transformations of a projective space; 

. the set of motions of an affine Euclidean space; 

. the set of motions of a hyperbolic space. 


ONAN HB WW 


All the groups enumerated above are groups of transformations (the set M is 
obviously the underlying set of the given space). Let us note that in the case of 
vector and affine spaces, there is the crucial requirement of the nonsingularity of the 
linear or affine transformations that guarantees the bijectivity of each mapping and 
thus the existence of an inverse element for each element of the group.” 

However, not all naturally occurring groups are groups of transformations. For 
example, with respect to the operation of addition, the set of all integers forms a 
group, as do the sets of the rational, real, and complex numbers, and likewise, the 
set of all vectors belonging to any arbitrary vector space. 

Let us remark that the axioms of motion 1, 2, and 3 introduced in Sect. 12.2 
can be expressed together as a single requirement, namely that the motions form a 


group. 


Example 13.5 Let us consider a finite set M consisting of n elements. A transfor- 
mation f : M — M is called a permutation, and the group of all permutations of the 
set M is called the symmetric group of degree n and is denoted by S,,. It is obvious 
that the group S,, is finite. 

We considered permutations earlier, in Sect. 2.6, in connection with the notions 
of symmetric and antisymmetric functions, and we saw that for defining a permu- 
tation f : M — M, one can introduce a numeration of the elements of the set VM, 


that is, one can write the set in the form M = {aj,...,a,} and designate the im- 
ages f(a1),..., f(Gn) of all the elements a, ..., a). Namely, let f(a1) =aj,,..., 
f (Gn) =aj;,. Then a permutation is defined by the matrix 
a=(; . 7 ae (13.2) 
Ji J2 *** Jn 


where in the upper row are written in succession all the natural numbers from | 
to n, and in the lower row, under the number k stands the number jg such that 
f (ax) = a;,. Since a permutation f : M — M is a bijective mapping, it follows that 
the lower row contains all the numbers from 1 to 1, except that they are written in 
some other order. In other words, (j1,..., jn) is Some permutation of the numbers 
(1,...,7). 


Unfortunately, there is a certain amount of disagreement over terminology, of which the reader 
should be aware: above, we defined a transformation of a set as a bijective mapping into itself, while 
at the same time, a linear (or affine) transformation of a vector (or affine) space is not by definition 
necessarily bijective, and to have bijectivity here, it is necessary to specify that the transformations 
be nonsingular. 


470 13 Groups, Rings, and Modules 


Writing a permutation in the form (13.2) allows us in particular to ascertain eas- 
ily that |S,,| = n!. Let us prove this by induction on n. For n = 1, this is obvious: the 
group S; contains the single permutation that is the identity mapping on the set M 
consisting of a single element. Let n > 1. Then by enumerating the elements of the 
set M in every possible way, we obtain a bijection between S;, and the set of ma- 
trices A of the form (13.2), whose first row contains the elements 1,...,, and the 
elements ji, ..., jn of the second row take all possible values from | to n. Let A’ be 
the matrix obtained from A by deleting its last column, containing the element j,. 
Let us fix this element: j, =k. Then the elements j,..., jn—1 of the matrix A’ as- 
sume all possible values from the collection of the n — 1 numbers (1,..., k, ...,N), 
where the symbol ~, as before, denotes the omission of the corresponding element. 
It is clear that the set of all possible matrices A’ is in bijective correspondence with 
Sn—1, and by the induction hypothesis, the number of distinct matrices A’ is equal to 
|Sn—1| = (nm — 1)!. But since the element j, = k can be equal to any natural number 
from | to n, the number of distinct matrices A is equal to n(n — 1)! =n!. This gives 
us the equality |S,| =a!. 

Let us note that the numeration of the elements of the set M used for writing 
down permutations plays the same role as the introduction of coordinates (that is, a 
basis) in a vector space. Furthermore, the matrix (13.2) is analogous to the matrix 
of a linear transformation of a space, which is defined only after the choice of a 
basis and depends on that choice. However, for our further purposes, it will be more 
convenient to use concepts that are not connected with such a choice of numeration 
of elements. 

We shall use the concept of transposition, which was introduced in Sect. 2.6 
(p. 45). The definition given there can be formulated as follows. Let a and b be two 
distinct elements of the set M. Then a transposition is a permutation of the set M 
that interchanges the places of the elements a and b and leaves all other elements of 
the set M fixed. Denoting such a transposition by tg,,, we can express this definition 
by the relationships 


Ta,b(a) =b, Ta,b(b) =a, Ta,b(x) =x (13.3) 


for allx 4a andx #b. 

In this notation, Theorem 2.23 from Sect. 2.6 can be formulated as follows: every 
permutation g of a finite set is the product of a finite number of transpositions, that 
iS, 


§ = Tay,by Tar, bo °° * Taz by (13.4) 


As we saw in Sect. 2.6, in relationship (13.4), the number k and the choice of ele- 
ments a), b;,..., ax, by for the given permutation g are not uniquely defined. This 
means that for a given permutation g, the representation (13.4) is not unique. How- 
ever, as was proved in Sect. 2.6 (Theorem 2.25), the parity of the number k of a 
permutation g is uniquely determined. Permutations for which the number k in the 
representation (13.4) is even are called even, and those for which the number k is 
odd are called odd. 


13.1 Groups and Homomorphisms 471 


Example 13.6 The collection of all even permutations of n elements forms a sub- 
group of the symmetric group S,, (it obviously satisfies conditions (a), (b), (c) in 
the definition of a subgroup). It is called the alternating group of degree n and is 
denoted by A,. 


Definition 13.7 Let g be an element of G. Then for every natural number 1, the el- 
ement g” = g---g (n-fold product) is defined. For a negative integer m, the element 
g’” is equal to (g~!)~”, and for zero, we have g° =e. 


It is easily verified that for arbitrary integers m and n, we have the relationship 


gg" = ae 
From this, it is clear that the collection of elements of the form g”, where n runs 
over the set of integers, forms a subgroup. It is called the cyclic subgroup generated 
by the element g and is denoted by {g}. 

There are two cases that can occur: 


(a) All the elements g”, as n runs through the set of integers, are distinct. In this 
case, we say that g is an element of infinite order in the group G. 

(b) For some integers m and n, m #n, we have the equality g’” = g”. Then, obvi- 
ously, g’””—” = e. This means that there exists a natural number k (for instance 
|m —n|) such that g* = e. In this case, we say that g is an element of finite order 
in the group G. 


If g is an element of finite order, then the smallest natural number k such that 
g* =e is called the order of the element g. If for some integer n, we have g” =e, 
then the number 7 is an integer multiple of the order k of the element g. Indeed, 
if such were not the case, then we could divide the number n by k with nonzero 
remainder: n = qk +r, where 0 <r <k. From the equalities g” = e and g* =e, we 
could conclude that g” = e, in contradiction to the definition of the order k. If in the 
group G there exists an element g such that G = {g}, then the group G is called a 
cyclic group. It is obvious that if G = {g} and the element g has finite order k, then 
|G| =k. Indeed, in this case, e, g, g?, wks gi! are all the distinct elements of the 
group G. 

Now we shall move on to discuss mappings of groups (homomorphisms), which 
play a role in group theory analogous to that of linear transformations of vector 
spaces in linear algebra. Let G and G’ be any two groups, and let e € G and e’ € G’ 
be their identity elements. 


Definition 13.8 A mapping f : G > G’ is called a homomorphism if for every pair 
of elements g; and go of the group G, we have the relationship 


f (g192) = f (gi) f(g2), (13.5) 


where it is obviously implied that on the left- and right-hand sides of equality (13.5), 
the juxtaposition of elements indicates the multiplication operation in the respective 
group (on the left, in G; on the right, in G’). 


472 13 Groups, Rings, and Modules 


From equality (13.5), it is easy to derive the simplest properties of homomor- 
phisms: 


l. faze; 
2. f(g~') =(f(g))~! for every g € G; 
3. f(g”) = (f(g))” for every g € G and every integer n. 


For the proof of the first property, let us set g} = g2 = e in formula (13.5). Then 
taking into account the equality e = ee, which is obvious from the definition of the 
identity element, we obtain that 


fle) = flee) = fle) fe). 


It remains only to multiply both sides of the relationship f(e) = f(e) f(e) by the 
element (f(e))~! of the group G’, after which we obtain the required equality e’ = 
f (e). The second property follows at once from the first: setting in (13.5) g1 = g 
and g> = g~!, and taking into account the equality e = gg—!, we obtain 


e' = f(e)= f(gg') = f(g) f(g), 


from which, by the definition of the inverse element, it follows that f(g~!) = 
(f(g))~!. Finally, the third property is obtained for positive n by induction from 
(13.5), and for negative n, it is also necessary to apply property 2. 


Definition 13.9 A mapping f : G > G’ is called an isomorphism if it is a homo- 
morphism that is also a bijection. Groups G and G’ are said to be isomorphic is 
there exists an isomorphism f : G > G’. This is denoted as follows: G ~ G’. 


Example 13.10 Assigning to each nonsingular linear transformation of a vector 
space L of dimension n its matrix (in some fixed basis of the space L), we obtain an 
isomorphism between the group of nonsingular linear transformations of this space 
and the group of nonsingular square matrices of order n. 


The notion of isomorphism plays the same role in group theory as the notion of 
isomorphism plays in the theory of vector spaces, and the notion of homomorphism 
plays the same role as the notion of arbitrary linear transformation (in vector spaces 
of arbitrary dimension). The analogy between these concepts is revealed particularly 
in the fact that the answer to the question whether a homomorphism f : G > G’ is 
an isomorphism can be formulated in terms of its image and kernel, just as was the 
case for linear mappings. 

The image of a homomorphism ff is the set f(G), that is, simply the image of 
f as a mapping of sets G > G’. If follows from relationship (13.5) that f(G) is a 
subgroup of G’. The kernel of ahomomorphism f is the set of elements g € G such 
that f(g) =e’. It is likewise not difficult to conclude from (13.5) that the kernel is a 
subgroup of G. 

Using the notions of image and kernel, we may say that a homomorphism 
f :G— G’ is an isomorphism if and only if its image consists of the entire group 


13.1 Groups and Homomorphisms 473 


G’ and its kernel consists of only the identity element e € G. The proof of this 
assertion is based on relationship (13.5) and properties | and 2: if for two ele- 
ments gj and go of a group G, we have the equality f(g1) = f(g2), then through 
right multiplying both sides by the element (f(g1))~! of the group G’, we obtain 
e' = f(g2)(f(g1))~! = f(g28; |), from which it follows that g2g,' =e, that is, 
&§1 = 82. 

It is important, however, to note that the analogy between isomorphisms of 
groups and isomorphisms of vector spaces does not extend all that far: most of the 
theorems from Chap. 3 do not have suitable analogues for groups, even for finite 
groups. For example, one of the most important results of Chap. 3 (Theorem 3.64) 
states that all vector spaces of a given finite dimension are isomorphic to one an- 
other. But there exist even finite groups of a given order that are not isomorphic; see 
Example 13.24 on p. 484. 

Another property of groups is related to whether the product of elements in a 
group depends on the order in which they are multiplied. In the definition of a group, 
no condition of this sort was imposed, and therefore, we may assume that in general, 
2122 # g2g1. Very frequently, such is the case. For example, nonsingular square 
matrices of a given order n with the standard operation of matrix multiplication 
form a group, and as the example presented in Sect. 2.9 on p. 64 shows, already for 
n = 2, it is generally the case that AB #~ BA. 


Definition 13.11 If in a group G the equality g1 92 = gog1 holds for every pair of 
elements g1;, g2 € G, then G is called a commutative group or, more usually, an 
abelian group.* 


For example, the groups of integers, rational numbers, real numbers, and complex 
numbers with the operation of addition are all abelian. Likewise, a vector space is 
an abelian group with respect to the operation of vector addition. It is easy to see 
that every cyclic group is abelian. 

Let us present one result that holds for all finite groups but that is especially easy 
to prove (and we shall use it frequently in the sequel) for abelian groups. 


Lemma 13.12 For every finite abelian group G, the order of each of its elements 
divides the order of the group. 


Proof Let us denote by g1, g2,..., gn the complete set of elements of G (so we 
obviously have n = |G]), and let us right multiply each of them by some element 
g €G. The elements thus obtained, g1g, g22,..., 8,g, will again all be distinct. 
Indeed, given the equality g; g = g;g, right multiplying both sides by g! yields the 
equality 9; = gj. Since the group G contains n elements altogether, it follows that 
the elements g12, 928,..., ng are the same as the elements g1, g2,..., gn, though 
perhaps arranged in some other order: 


§18 = 8i> 828 = 8in> ie) 8n8 = Bi,- 


3Named in honor of the N orwegian mathematician Niels Henrik Abel (1802-1829). 


474 13 Groups, Rings, and Modules 
On multiplying these equalities, we obtain 


(218)(828)*** (8n8) = Bi Bin *** Bin: (13.6) 


Since the group G is abelian, we have 


(g12)(828)--: (8n&) = 8182°** 8ng"s 


and since gj,, 8i,,---» i, are the same elements g}, 92,..., Zn, then setting h = 
8182°+* Sn, we obtain from (13.6) the equality hg” = h. Left multiplying both sides 
of the last equality by h—!, we obtain g” = e. As we saw above, it then follows that 
the order of the element g divides the number n = |G|. 


Definition 13.13 Let H,, H2,..., H, be subgroups of G. The group G is called 
the direct product of the subgroups Hj, Ho,..., H, if for all elements h; € H; and 
h; € H; from distinct subgroups, we have the relationship hjh; = h;h;, and every 
element g € G can be represented in the form 


ga=hyh2:--h,, h) € Hj,i=1,2,...,7, 


and for each element g € G, such a representation is unique. The fact that the group 
G is a direct product of subgroups H, H2,..., H; is denoted by 


G=H, x H)x::-x H,. (13.7) 


In the case of abelian groups, a different terminology is usually used, related to 
the majority of examples of interest. Namely, the operation defined on the group 
is called addition instead of multiplication, and it is denoted not by gj g2, but by 
gi +g. In keeping with this notation, the identity element is called the zero element 
and is denoted by 0, and not by e. The inverse element is called the negative or 
additive inverse and is denoted not by g~!, but by —g, and the exponential notation 
g” is replaced by the multiplicative notation ng, which is defined similarly: ng = 
g+---+g (n-fold sum) ifn > 0, by ng = (—g)+---+ (—g) (n-fold sum) ifn < 0, 
and by ng = 0 if n = 0. The definition of homomorphism remains exactly the same 
in this case, where it is required only to replace in formula (13.5) the symbol for the 
group operation: 


f(gi + 82) = f(gi) + f(g2). 
Properties 1-3 here take the following form: 


1. f0)=0; 
2. f(—g) =—f(g) for all g € G; 
3. f(ng) =nf (g) for all g € G and for every integer n. 


This terminology agrees with the example of the set of integers and, in the termi- 
nology we employed earlier, the example of vectors that form an abelian group with 
respect to the operation of addition. 


13.2 Decomposition of Finite Abelian Groups 475 


In the case of abelian groups (with the operation of addition), instead of the 
direct product of subgroups H), H2,..., H, one speaks of their direct sum. Then 
the definition of the direct sum reduces to the condition that every element g € G 
can be represented in the form 


gH=hythot+-:-+h,, hye H,i=1,2,...,4, 


and that for each element g € G, the representation is unique. It is obvious that this 
last requirement is equivalent to the requirement that the equality hy +h2+---+ 
h, =0 be possible only if hy = 0, hz =0,..., h- = 0. That a group G is the direct 
sum of subgroups H), H2,..., H; is denoted by 


G=H,@h29:::-@H,. (13.8) 


It is obvious that in both cases (13.7) and (13.8), the order of the group G is equal 
to 


|G| =|Ai|- |H2|---| rl. 


In perfect analogy to how things were done in Sect. 3.1 for vector spaces, we may 
define the direct product (or direct sum) of groups that in general are not originally 
the subgroups of any particular group and that even, perhaps, are of completely 
different natures from one another. 


Example 13.14 If we map every orthogonal transformation U of a Euclidean space 
to its determinant |U|, which, as we know, is equal to +1 or —1, we obtain a ho- 
momorphism of the group of orthogonal transformations into the symmetric group 
Sz of order 2. If we map every Lorentz transformation U of a pseudo-Euclidean 
space to the pair of numbers ¢(U) = (|U|, v(U)), defined in Sect. 7.8, we obtain a 
homomorphism of the group of Lorentz transformations into the group S2 x So. 


Example 13.15 Let (V,L) be an affine Euclidean space of dimension n and G the 
group of its motions. Then the assertion of Theorem 8.37 can be formulated as the 
equality G = T,, x O,, where T,, is the group of translations of the space V, and O, 
is the group of orthogonal transformations of the space L. Let us note that T,, ~ L, 
where L is understood as a group under the operation of vector addition. Indeed, let 
us define the mapping f : 7, — L that to each translation Zq by the vector a assigns 
this vector a. Obviously, the mapping f is bijective, and by virtue of the property 
TaTbh = Fa+b, it is an isomorphism. Thus Theorem 8.37 can be formulated as the 
relationship G~L x Oy. 


13.2 Decomposition of Finite Abelian Groups 


Later in this chapter we shall restrict our attention to the study of finite groups. 
The highest goal in this area of group theory is to find a construction that gives a 


476 13 Groups, Rings, and Modules 


description of all finite groups. But such a goal is far from accessible; at least at 
present, we are far from attaining it. However, for finite abelian groups, the answer 
to this question turns out to be unexpectedly simple. Moreover, both the answer and 
its proof are very similar to Theorem 5.12 on the decomposition of a vector space 
as a direct sum of cyclic subspaces. For the proof, we shall require the following 
lemmas. 


Lemma 13.16 Let B be a subgroup of A, and a an element of the group A of 
order k. If there exists anumber m € N relatively prime to k such that ma € B, then 
a is an element of B. 


Proof Since the numbers m and k are relatively prime, there exist integers r and s 
such that kr + ms = 1. Multiplying ma by s and adding kra to the result (which is 
equal to zero, since k is the order of the element a), we obtain a. But sma = s(ma) 
belongs to the subgroup B. From this, it follows that a is also an element of B. 


Lemma 13.17 [f A = {a} is a cyclic group of order n, and we set b = ma, where 
m €N is relatively prime to n, then the cyclic subgroup B = {b} generated by the 
element b coincides with A. 


Proof Since a € A, we have by Lemma 13.12 that the order k of the element a 
divides the order of the group A, which is equal to n, and the relative primality 
of the numbers m and n implies the relative primality of the numbers k and m. 
From Lemma 13.16, it follows that a € B, which means that A C B, and since we 
obviously have also B C A, we obtain the required equality B = A. 


Corollary 13.18 Under the assumptions of Lemma 13.17, every element c € A can 
be expressed in the form 


c=md, déeA,meZ. (13.9) 


Indeed, if in the notation of Lemma 13.17, the group A is the group {b}, then the 
element c has the form kb, and since b = ma, we obtain equality (13.9) in which 
d=ka. 


Definition 13.19 A subgroup B of a group A is said to be maximal if B 4 A and B 
is contained in no subgroup other than A. 


It is obvious that there exist maximal subgroups in every finite group that consists 
of more than just a single element. Indeed, beginning with the identity subgroup 
(that is, the subgroup consisting of a single element), we can include it, if it is 
not itself maximal, in some subgroup B, different from A. If in B; we have not 
yet obtained a maximal subgroup, then we can include it in some subgroup B2 
different from A. Continuing this process, we eventually can go no further, since 
all the subgroups B,, B2,... are contained in the finite group A. The last subgroup 


13.2 Decomposition of Finite Abelian Groups 477 


obtained when we stop the process will be maximal. We remark that we do not assert 
(nor is it true) that the maximal subgroup we have constructed is unique. 


Lemma 13.20 For every maximal subgroup B of a finite abelian group A, there 
exists an element a € A not belonging to B such that the smallest number m € N for 
which ma belongs to B is prime, and every element x € A can be represented in the 
form 


x=kat+b, (13.10) 


for k an integer, be B. 
Later, we shall denote the prime number m that appears in Lemma 13.20 by p. 


Proof of Lemma 13.20 Let us take as a any element of the group A not belonging 
to the subgroup B. The collection of all elements of the form ka + b, where k is 
an arbitrary integer and b an arbitrary element of B, obviously forms a subgroup 
containing B (it is easy to see that B consists of elements x such that in the repre- 
sentation x = ka + b, the number k is equal to 0). It is obvious that this subgroup 
does not coincide with B, since it contains the element a (for k = 1 and b= 0), and 
this means, in view of the maximality of the subgroup B, that it coincides with A. 
From this follows the representation (13.10) for every element x in the group A. 

It remains to prove that for some prime number p, the element pa belongs to B. 
Since the element a is of finite order, we must have na = 0 for some n > 0. In 
particular, na € B. Let us take the smallest m € N for which ma € B and prove that 
it is prime. 

Suppose that such is not the case, and that p is a prime divisor of m. Then m = 
pm, for some integer m, < m. Let us set aj = mya. As we have seen, the collection 
of all elements of the form ka; + b (for arbitrary integer k and b € B) forms a 
subgroup of the group A containing B. If the element a; were contained in B, 
then that would contradict the choice of m as the smallest natural number such that 
ma € B. This means that a; ¢ B, and in view of the maximality of the subgroup B, 
the subgroup that we constructed of elements of the form ka; + b coincides with A. 
In particular, it contains the element a, that is, a = ka; + b for some k and b. From 
this, it follows that pa = kpa,+ pb. But pay = pmja =ma € B, and since pbe B, 
this means that pa € B, which contradicts the minimality of m. This means that the 
assumption that m has prime divisors less than m is false, and so m = p is a prime 
number. 


Remark 13.21 We chose as a an arbitrary element of the group A not contained 
in B. In particular, in place of a, we could as well choose any element a’ = a + b, 
where b € B. Indeed, from a = a’ — b and a’ € B it would follow that we would 
also havea € B. 


We can now state the fundamental theorem of abelian groups. 


478 13 Groups, Rings, and Modules 


Theorem 13.22 Every finite abelian group is the direct sum of cyclic subgroups 
whose orders are equal to powers of prime numbers. 


Thus, the theorem asserts that every finite abelian group A has the decomposition 
A=A,@---@Ar;, (13.11) 


where the subgroups A; are cyclic, that is, A; = {a;}, and their orders are powers of 
prime numbers, that is, |A;| = ie , where p; are prime numbers. 


Proof of Theorem 13.22 Our proof is by induction on the order of the group A. For 
the group of order 1, the theorem is obvious. Therefore, to prove the theorem for a 
group A, we may assume that it has been proved for all subgroups B C A, BA A, 
since for an arbitrary subset B C A with B + A, the number of elements of B is less 
than |A]. 

In particular, let B be a maximal subgroup of the group A. By the induction 
hypothesis, the theorem is valid for this subgroup, and it therefore has the decom- 
position 


B=(0::-®C,, (13.12) 


in which the C; are cyclic subgroups each of which has order the power of a prime 
number: 


C; = {ci}, pe = 0. 


Lemma 13.20 holds for the subgroup B; let a € A, a ¢ B, be the element provided 
for in the formulation of this lemma. By hypothesis, every element x € B can be 
represented in the form 


xX=kicy +--+ +k-e,. 


In particular, this holds for the element b = pa (in the notation of Lemma 13.20): 
pa=kicy +--+ +k-e,-. 


Let us select the terms k;c; in this decomposition that can be written in the form 
pd;, where d; € C;. These are first of all, the terms kc; for i such that p; 4 p. 
This follows from Corollary 13.18. Moreover, all elements of the form k;c; possess 
this property if p; = p and k; is divisible by p. Let the chosen elements be kjc;, 
i=1,...,s—1. Then for the remaining elements kjcj,i = s5,...,7, we have p; = p 
and ei is not divisible by p. Setting 


kicj = pdi, dj €Cj,i=1,...,s—-1, dj+---+ds-1=d, (13.13) 


we obtain 


pa=pdt+kgcs +--+ +kpcp. 


13.2 Decomposition of Finite Abelian Groups 479 


We can now use the freedom in the choice of the element a € A, which was men- 
tioned in Remark 13.21, and take instead of a, the element a’ = a — d, sinced € B 
in view of formula (13.13). We then have 


pa’ =kscs +++: +key. (13.14) 
There are now two possible cases. 


Case 1. The number s — | is equal to r, and then equality (13.14) gives 


/ 


pa =0. 
In this case, the group A decomposes as a direct sum of cyclic subgroups as follows: 
A=C10@:-- OC, ®C;+1, 


where C;+1 = {a’} is a subgroup of order p. 

Indeed, Lemma 13.20 asserts that every element x € A can be represented in the 
form ka’ + b, and since in view of (13.12), the element b can be represented in the 
form 


b=kic1+---+k-e,, 
it follows that x has the form 
x=kicy +---+k-c, tka’. (13.15) 


This proves the first condition in the definition of a direct sum. 
Let us prove the uniqueness of representation (13.15). For this, it suffices to prove 
that the equality 


kicy +--+ - +key + ka’ =0 (13.16) 
is possible only for kjcy =---=k,;c; = ka’ = 0. Let us rewrite (13.16) in the form 
kal = —kycy — +++ — kc. (13.17) 


This means that the element ka’ belongs to B. If the number k were not divisible by 
p, then k and p would be relatively prime, since the element a’ has order p, and by 
Lemma 13.16, we would then obtain that a’ € B. But this contradicts the choice of 
the element a and the construction of the element a’. This means that p must divide 
k, and since pa’ = 0, it follows that we also have ka’ = 0. Thus equality (13.17) is 
reduced to kjcj +---+k;c, =0, and from the fact that the group B is the direct 


sum of subgroups Ci,..., C;, we obtain that kjc) = 0,..., k-c, =0. 
Case 2. The number s — | is less than r. Let us set ksc; =d;, ..., kycy = d,, and 
fori =1,...,s —1, let us set c; = dj. By Lemma 13.17, the element d; generates 


the same cyclic subgroup C; as c;. For i < s — 1, this assertion is a tautology, and 
for i > s — 1, it follows from the fact that the numbers k; are by assumption not 


480 13 Groups, Rings, and Modules 


divisible by p, and p”‘c; = 0 for all i > s. Equality (13.14) can then be rewritten as 
follows: 


pa'=d,+-+-+d,. (13.18) 


Let ms <--- <m,. Let us denote by C/. the cyclic group generated by the element 
a’, that is, let us set C/, = {a’}. Let us prove that the order of the element a’, and 
therefore the order of the group C’,, is equal to por: 


elaper. (13.19) 
Indeed, in view of (13.18), we have 
pra _ pds feet pd, — 0, 


since pid; =0, m; <_m,. On the other hand, in view of relationship (13.18), we 
have 


pa’ _ greg, deeack pr td, & 0, 


since pd, 4 0, and in view of (13.12), the sum of the elements pa, EC; 
cannot equal 0 if at least one term is not equal to 0. This proves (13.19). 
Now let us prove that 


A=C1®--@C,10C, (13.20) 
that is, that every element x € A can be uniquely represented in the form 
xX=ypte-ty-ity., yreEC,...,y-1€ C1, 9, EC. (13.21) 


First let us prove the possibility of representation (13.21). Since every element 
x € Acan be represented in the form ka’ +b, b € B, it suffices to prove that it is pos- 
sible to represent separately a’ and an arbitrary element b € B in the form (13.21). 
This is obvious for an element a’, since it belongs to the cyclic group C}. = {a’}. As 
for elements of B, each b € B can be represented in the form 


b=kid, +---+krd,, 


according to formula (13.12) and in view of the fact that C; = {d;}. Therefore, it 
suffices to prove that each of the elements d; can be represented in the form (13.21). 
For dj, ..., d;—1, this is obvious, since 


die Cj ={dj}, i=l,...,r—1. 
Finally, in view of (13.18), we have 
dy = —ds — ++» —dy-, + pa’, 


and this is the representation of the element d, that we need. 


13.3. The Uniqueness of the Decomposition 481 


Let us now prove the uniqueness of representation (13.21). For this, it suffices to 
prove that the equality 


kydy +++ +kyp-1dy-1 + kya’ =0 (13.22) 
is possible only for kid; =--- =k,;a’ =0. Let us suppose that k, is relatively prime 
to p. Then 

kpa! = —kid, — +++ — ky-1dy-1, 


and in view of the fact that p”’*!a’ = 0, we obtain by Lemma 13.16 that a’ € B. 
But the element a € A was chosen as an element not belonging to the subgroup B. 
This means that the element a’ also does not belong to B. 

Let us now consider the case in which the number k; is divisible by p. Let k, = 
pl. Then 


pla! = kid — +++ — ky—1dy-1. 


Let us replace pa’ on the left-hand side of this relationship by the expression d, + 
--- +d, on the basis of equality (13.18). On transferring all terms to the left-hand 
side, we obtain 


Ids +++» +1d, +kyd) +--+ +k,--1d,-1 = 0. 


From the fact that by hypothesis, the group B is the direct sum of groups C),..., C;, 
it follows that in this equality, /d, = 0. Since the order of the element d, is equal 
to p’”’, this is possible only if p”” divides /, and this means that p’"”+! divides k,. 
But we have seen that the order of the element a’ is equal to p’””+!, and this means 
that k,a’ = 0. Then it follows from equality (13.22) that kjd; +---+k,-,d--1 =0. 
And since by the induction hypothesis, the group B is the direct sum of the groups 
Ci,...,C;, it follows that kid) =---=k,;—1d-—,; =0. This completes the proof of 
the theorem. 


13.3 The Uniqueness of the Decomposition 


The theorem on the uniqueness of the Jordan normal form has an analogue in the 
theory of finite abelian groups. 


Theorem 13.23 For different decompositions of the finite abelian group A into a 
direct sum of cyclic subgroups whose orders are prime powers, whose existence is 
established in Theorem 13.22, 


A=A\®:--@A,, |Ail=p™, (13.23) 


Nj 


the orders P; of the cyclic subgroups Aj are unique. In other words, if 


482 13 Groups, Rings, and Modules 


is another such decomposition, then s =r, and the subgroups A’, can be reordered 
in such a way that the equality |A;,| =|Aj;| is satisfied for alli =1,...,r. 


Proof We shall show how the orders of the cyclic subgroups in the decomposition 
(13.23) are uniquely determined by the group A itself. For any natural number k, let 
us denote by kA the collection of elements a of the group A that can be represented 
in the form a = kb, where b is some element of this group. It is obvious that the 
collection of elements kA forms a subgroup of the group A. Let us prove that the 
orders |kA| of these subgroups (for various k) determine the orders of the cyclic 
groups |A;| in the decomposition (13.23). 

Let us consider an arbitrary prime number p and analyze the case that k is a 
power of a prime number p, that is, k = p’. Let us factor the order |p‘ A| of the 
group p' A into a product of a power of p and numbers n; relatively prime to p: 


|p'A|=p"nj, (ni, p) =1. (13.24) 


On the other hand, for a prime number p, let us denote by /; the number of subgroups 
A; of order p' appearing in the decomposition (13.23). We shall present an explicit 
formula that expresses the numbers /; in terms of r;. Since these latter numbers are 
determined only by the group A, it follows that the numbers /; also do not depend 
on the decomposition (13.23) (in particular, they are equal to zero if and only if all 
prime numbers p; for which |A;| = p;"' differ from p). 

First of all, let us calculate the order of the group A in another way. Let us note 
that A = p°A, so that this is the case i = 0. The definition of the number /; shows 
that in the decomposition (13.23), we have J; groups of order p, /2 groups of order 
p*,..., and the remaining groups have orders relatively prime to p. Hence it follows 
that 


2l2 


|A| = p!' p no, (no, p)=1. 


Let us set 
|A|=p’°no, (no, p)=1. 


Then we can write the relationship above in the form 
4 +2h4+33+---=70. (13.25) 


Now let us consider the case that k = p' > 1, that is, the number i is greater 
than 0. First of all, it is obvious that for every natural number k, it follows from 
(13.23) that 


kKA=kA, @®kA2@::--@kA;. 


It is obvious that all properties of a direct sum are satisfied. 
Now, as in the case examined above, let us calculate the order of the group p'A 
in another way. It is obvious that | p' A] = |p! Aj|---|p’A;]|. If for some j, we have 
way P P P J 
|Aj|= v,° and p; 4 p, then Lemma 13.17 shows that p'A; = A;, and we have 


13.3 The Uniqueness of the Decomposition 483 


|p ‘Aj jl =IAjl = ps , which is relatively prime to p. Thus in the decomposition 


[p'A| = |p’ Aj|-- “pi A,|, all the factors |p! Aj;|, where |A;| = P; 7 and Pj FD, 
together give a number that is relatively prime to p, and in formula (13.24), they 
make no contribution to the number r;. It remains to consider the case Pj =P. Since 
Aj is a cyclic group, it follows that A; = {aj}. It is then clear that p' Aj = =i(p aj}. 
Let us find the order of the element p'aj. Since p”/a; =0, we have pi ‘(plaj)= 
Oifi < mj, and p! aj =Oifi=m;. 

Let us prove that p’”~' is precisely the same as the order of the element p'a;. 
Let this order be equal to some number s. Then s must divide p’/~', which means 
that it is of the form p’. If t < mj —i, then the equality p'(p'a;) = 0 would show 
that pitta j =O, that is, that the element a; had order less than p’"i. This means 
that | p'Aj| = p"/~ for i < mj. The fact that p'A; = 0 for i > mj; (which means 
that |p’ A;| = 1) is obvious. 

We can now literally repeat the argument that we used earlier. We see that in the 
decomposition 


pA=p'A\@ p'Ar®---® p'Ay, 


subgroups of order p occur when m ; —i = 1, that is, m; =i +1, and this means that 
in our adopted notation, they occur /;,; times. Likewise, the subgroups of order p* 
occur when m; =i + 2, that is, /;2 times, and so on. Moreover, certain subgroups 
will have order relatively prime to p. This means that 


[pi Al = pli! p22...n;, where (n;, p) = 1. 
In other words, in accordance with our previous notation, we have 
lig, + 2li4a +--+ = 77. (13.26) 


In particular, formula (13.25) is obtained from (13.26) for i = 0. 
If we now subtract from each formula (13.26) the following one, we obtain that 
for alli = 1,2,..., we have the equalities 


iitligiat-+-=ri-1 1". 
Repeating the same process, we obtain 


lj =rj-1 — 27) +ri41. 


These relationships prove Theorem 13.23. 


Theorems 13.22 and 13.23 make it easy to give the number of distinct (up to 
isomorphism) finite abelian groups of a given order. 


Example 13.24 Suppose, for example, that we would like to determine the number 
of distinct abelian groups of order p*q?, where p and q are distinct prime numbers. 
Theorem 13.22 shows that such a group can be represented in the form 


A=C,®@:::@C,, 


484 13 Groups, Rings, and Modules 


where Cj; are cyclic groups whose orders are prime powers. From this decomposi- 
tion, it follows that 


|A] = [Ci] ---|Cs|. 


In other words, among the groups C;, there is either one cyclic group of order p?, or 
one of order p? and one of order p, or three of order p. And likewise, there is one 
of order q* or two of order g. Combining all these possibilities (three for groups 
of order p' and two for groups of order g/), we obtain six variants. Theorem 13.23 
guarantees that of the six groups thus obtained, none is isomorphic to any of the 
others. 


13.4 Finitely Generated Torsion Modules over a Euclidean Ring* 


The proofs of the theorem on finite abelian groups and the theorem on Jordan nor- 
mal form (just like the proofs of the corresponding uniqueness theorems) are so 
obviously parallel to each other that they surely are special cases of some more 
general theorems. This is indeed the case, and the main goal of this chapter is the 
proof of these general theorems. For this, we shall need two abstract (that is, defined 
axiomatically) notions. 


Definition 13.25 A ring is a set R on which are defined two operations (that is, two 
mappings R x R — R), one of which is called addition (for which an element that 
is the image of two elements a € R and b € R is called their sum and is denoted by 
a+b), and the second of which is multiplication (the element that is the image of 
a € R and be Riis called their product and is denoted by ab). For these operations 
of addition and multiplication, the following conditions must be satisfied: 


(1) With respect to the operation of addition, the ring is an abelian group (the iden- 
tity element is denoted by 0). 
(2) For all a,b, c € R, we have 


a(b+c)=ab+ac, (b+c)a=ba+ca. 
(3) For all a, b,c € R, the associative property holds: 


a(bc) = (ab)c. 


In the sequel, we shall denote a ring by the letter R and assume that it has a 
multiplicative identity, that is, that it contains an element, which we shall denote by 
1, satisfying the condition 


a-l=l-a=a forallaeR. 


In this chapter, we shall be considering only commutative rings, that is, it will be 
assumed that 


ab=ba foralla,be R. 


13.4 Finitely Generated Torsion Modules over a Euclidean Ring* 485 


We have already encountered the most important special case of a ring, namely 
an algebra, in connection with the construction of the exterior algebra of a vector 
space, in Chap. 10. Let us recall that an algebra is a ring that is a vector space, where, 
of course, consistency of the notions entering into these definitions is assumed. This 
means that for every scalar a (in the field over which the vector space in question is 
defined) and for all elements a, b of the ring R, we have the equality (wa)b = a(ab). 
On the other hand, we are quite familiar with an example of a ring that is not an 
algebra in any natural sense, namely the ring of integers Z with the usual arithmetic 
operations of addition and multiplication. 

Let us note a connection among the concepts we have introduced. If all nonzero 
elements of a commutative ring form a group with respect to the operation of mul- 
tiplication, then such a ring is called a field. We assume that the reader is familiar 
with the simplest properties of fields and rings. 

The concept that generalizes both the concept of vector space (over some field 
KX) with a linear transformation given on it and that of an abelian group is that of a 
module. 


Definition 13.26 An abelian group M (its operation is written as addition) is a 
module M over aring R if there is defined an additional operation of multiplication 
of the elements of the ring R by elements of the module M that produces elements 
of the module that have the following properties: 


a(m+n)=am-+an, 
(a+b)m=am-+bm, 
(ab)m = a(bm), 


lm=m, 
for all elements a,b € R and all elements m,n eM. 


For convenience, we shall denote the elements of the ring using ordinary letters 
a,b,..., and elements of the module using boldface letters: m,n,.... 


Example 13.27 An example of a module that we have encountered repeatedly is 
that of a vector space over an arbitrary field KK (here the ring R is the field K). On 
the other hand, every abelian group G is a module over the ring of integers Z: the 
operation defined on it of integral multiplication kg for k € Z and g € G obviously 
possesses all the required properties. 


Example 13.28 Let L be a vector space (real, complex, or over an arbitrary field K) 
and let A:L— L be a fixed linear transformation. Then we may consider L as a 
module over the ring R of polynomials in the single variable x (real, complex, or 
over a field KK), assuming, as we did earlier, for a polynomial f(x) € R and vector 
eeL, 


fxje= f(A)(e). (13.27) 


486 13 Groups, Rings, and Modules 


It is easily verified that all the properties appearing in the definition of a module are 
satisfied. 


Our immediate objective will be to find a restriction of the general notion of 
module that covers vector spaces and abelian groups and then to prove theorems for 
these that generalize Theorems 5.12 and 13.22. 

These two examples—the ring of integers Z and the ring of polynomials in a 
single complex variable (for simplicity, we shall restrict our attention to the special 
case K = C, but many results are valid in the general case)—have many similar 
properties, the most important of which is the uniqueness of the decomposition into 
irreducible factors, that is, prime numbers in the case of the ring of integers, and 
linear polynomials in the case of the ring of polynomials with complex coefficients. 
Both of these properties, in turn, derive from a single property: the possibility of 
division with remainder, which we shall introduce in the definition of certain rings 
for which it is possible to generalize the reasoning from previous sections. 


Definition 13.29 A ring R is called a Euclidean ring if 
ab#0 foralla,be R,af¢O0andb 0, 


and for nonzero elements a of the ring, a function g(a) is defined taking nonnegative 
integer values and exhibiting the following properties: 


(1) g(ab) = g(a) for all elements a,b € R,a40,b 40. 
(2) For all elements a, b € R, where a # 0, there exist g,r € R such that 


b=aq+r (13.28) 
and either r = 0 or g(r) < g(a). 


For the ring of integers, these properties are satisfied for g(a) = |a|, while for 
the ring of polynomials, they are satisfied for g(a) equal to the degree of the poly- 
nomial a. 


Definition 13.30 An element a of a ring R is called a unit or reversible element if 
there exists an element b € R such that ab = 1. An element b is called a divisor of 
the element a (one also says that a is divisible by b or that b divides a) if there exists 
an element c such that a = be. 


Clearly the property of divisibility is unchanged under multiplication of a or b 
by a unit. Two elements that differ by a unit are called associates. For example, 
in the ring of integers, the units are +1 and —1, and associates are integers that 
are either equal or differ by a sign. In the ring of polynomials, the units are the 
constant polynomials other than the one that is identically zero, and associates are 
polynomials that differ from each other by a constant nonzero multiple. 

An element p of a ring is prime if it is not a unit and has no divisors other than 
its associates and units. 


13.4 Finitely Generated Torsion Modules over a Euclidean Ring* 487 


The theory of decomposition into prime factors in a Euclidean ring repeats ex- 
actly what is known for the ring of integers. 

If an element a is not prime, then it has a divisor b such that a = bc, with c nota 
unit. This means that a is not a divisor of b, and there exists the representation b = 
aq +r with g(r) < g(a). But r = b — aq = b(1 — cq), and therefore g(r) > g(b), 
that is, g(b) < g(r) < g(a), which means that g(b) < g(a). Applying the same 
reasoning to b, we finally arrive at a prime divisor a, and we shall show that every 
element can be represented as the product of primes. The same argument as used in 
the case of integers or polynomials shows the uniqueness of this decomposition in 
the following precise sense. 


Theorem 13.31 Jf some element a in a Euclidean ring R has two factorizations 
into prime factors, 


G=Pi Pry A= 41°" 4s, 


then r = s, and with a suitable numeration of the factors, pj and qj; are associates 
for alli. 


As in the ring of integers, in every Euclidean ring, each element a ¥ 0 that is not 
a unit can be written in the form 


ny Ny 
a=up, ase Del, 


where u is a unit, all the p; are prime elements with no two of them associates, and 
nj are natural numbers. Such a representation is unique in a natural sense. 

As in the ring of integers or of polynomials in one variable, representation (13.28) 
for r 4 0 can be applied to elements b and r and repeated until we arrive at r = 0. 
We will thus obtain a greatest common divisor (gcd) of the elements a and J, that 
is, acommon divisor such that every other common divisor is a divisor of it. The 
greatest common divisor of a and b is denoted by d = (a, b) or d = gcd(a, b). This 
process, as it is for integers, is called the Euclidean algorithm (whence the name 
Euclidean ring). It follows from the Euclidean algorithm that a greatest common 
divisor of elements a and b can be written in the form d = ax + by, where x and y 
are some elements of the ring R. 

Two elements a and b are said to be relatively prime if their only common di- 
visors are units. Then we may consider that gcd(a, b) = 1, and as follows from the 
Euclidean algorithm, there exist elements x, y € R such that 


ax + by =1. (13.29) 


Let us now recall that the theorem on Jordan normal form holds in the case 
of finite-dimensional vector spaces, and that the fundamental theorem of abelian 
groups holds for finite abelian groups. Let us now derive analogous finiteness con- 
ditions for modules. 


488 13 Groups, Rings, and Modules 


Definition 13.32 A module M is said to be finitely generated if it contains a fi- 
nite collection of elements mj ,...,m,, called generators, such that every element 
m &€ M can be expressed in the form 


m=a\m,+---+a;-m, (13.30) 


for some elements a,...,a, of the ring R. 


For a vector space considered as a module over a certain field, this is the def- 
inition of finite dimensionality, and representation (13.30) is a representation of a 
vector m in the form of a linear combination of vectors m1, ..., mm, (let us note that 
the system of vectors m,,...,m, will in general not be a basis, since we did not 
introduce the concept of linear independence). In the case of a finite abelian group, 
we may generally take for m,...,m,, all the elements of the group. 

Let us formulate one additional condition of the same type. 


Definition 13.33 An element m of a module M over a ring R is said to be a torsion 
element if there exists an element a, 4 0 of the ring R such that 


damm = 0, 


where 0 is the null element of the module M, and the subscript in aj, is introduced 
to show that this element depends on m. A module is called a torsion module if all 
of its elements are torsion elements. 


In a finitely generated torsion module, there is an element a 4 0 of the ring R 
such that am = 0 for all elements m € M. Indeed, it suffices to set a = dm, --- Gm, 
for the elements mj ,...,m, in representation (13.30). If the ring R is Euclidean, 
then we can conclude that a ¥ 0. For the case of a finite abelian group, we may take 
a to be the order of the group. 


Example 13.34 Let M be a module determined by a vector space L of dimension 
n and by a linear transformation according to formula (13.27). For an arbitrary 
vector e € L, let us consider the vectors 


e, A(e), re A" (e). 


Their number, 1 + 1, is greater than the dimension n of the space L, and therefore, 
these vectors are linearly dependent, which means that there exists a polynomial 
f(x), not identically zero, such that f(.A)(e) = 0, that is, in our module M, the 
element e is a torsion element. 


But if, as we did in Example 13.27, we view a vector space as a module over 
the field R or C, then not a single nonnull vector will be a torsion element of the 
module. 

Let M be a module over a ring R. A subgroup M’ of the group M is called a 
submodule if for all elements a € R and m’ € M’, we have am’ € M’. 


13.4 Finitely Generated Torsion Modules over a Euclidean Ring* 489 


Example 13.35 It is obvious that every subgroup of an abelian group viewed as a 
module over the ring of integers is a submodule. Analogously, for a vector space 
viewed as a module over a ring coinciding with a suitable field, every subspace is a 
submodule. If M is a module defined by a vector space L and a linear transformation 
A of L according to formula (13.27), then as is easily verified, every submodule of 
M is a vector subspace that is invariant with respect to the transformation A. 

If M’ C M is a submodule, and m is any element of the module M, then it is 
easily verified that the collection of all elements of the form am + m’, where a is 
an arbitrary element of the ring R, and m’ is an arbitrary element of the submodule 
M’, is a submodule. We shall denote it by (m, M’). 

Since we are assuming that the ring R is Euclidean, it follows that for every 
torsion element m € M, there exists an element a € R that exhibits the property 
am = 0 and is such that g(a) is the smallest value among all elements with this 
property. Then every element c for which cm = 0 is divisible by a. Indeed, if such 
were not the case, we would have the relationship 


c=aqtr, g(r) <¢(a), 


and clearly rm = 0, which contradicts the definition of a. In particular, two such 
elements a and a’ divide each other; that is, they are associates. The element a € R 
is called the order of the element m € M. One must keep in mind that this expression 
is not quite precise, since order is defined only up to associates. 


Example 13.36 If, as in Example 13.28, a module is a vector space L viewed as a 
module over the polynomial ring f(x) with the aid of formula (13.27), then every 
element e € L is a torsion element, and its order is the same as the minimal polyno- 
mial of the vector e (see the definition on p. 146), and the indicated property (every 
element c for which cm = 0 is divisible by the order of the element m) coincides 
with Theorem 4.23. 


Definition 13.37 A submodule M’ of a module M is said to be cyclic if it contains 
an element m’ such that all the elements of the module M’ can be represented in the 
form am’ with some a € R. This is written M’ = {m’}. 


Definition 13.38 A module M is called the direct sum of its submodules Mj,..., 
M, if every element m € M can be written as a sum 


m=m,+-::-+m,, m; eM, 
and such a representation is unique. It is obvious that to establish the uniqueness of 
this decomposition, it suffices to prove that if m; +---+m,=0, m; € Mj, then 


m; = 0 for all 7. This can be written as the equality 


M=M,0:-::@M,. 


490 13 Groups, Rings, and Modules 


The fundamental theorem that we shall prove, which contains Theorem 5.12 on 
the Jordan normal form and Theorem 13.22 on finite abelian groups as special cases, 
is the following. 


Theorem 13.39 Every finitely generated torsion module M over a Euclidean ring 
R is the direct sum of cyclic submodules 


M=C\®-:-®C,, C= {mj}, (13.31) 


such that the order of each element mj; is a power of a prime element of the ring R. 


Example 13.40 If M is a finite abelian group viewed as a module over the ring 
of integers, then this theorem reduces directly to the fundamental theorem of finite 
abelian groups (Theorem 13.22). 

Let the module M be determined by the finite-dimensional complex vector space 
L and the linear transformation A of L according to formula (13.27). Then the C; 
are vector subspaces invariant with respect to A, and in each of these, there exists a 
vector mj; such that all the remaining vectors can be written in the form f(A)(mj;). 
The prime elements in the ring of complex polynomials are the polynomials of the 
form x — A. By assumption, for each vector m;, there exist some A; and a natural 
number n; such that 


(A — A; €)" (m;) = 0. 
If we take the smallest possible value n;, then as proved in Sect. 5.1, the vectors 
mj,  (A-A€)Omj), —«.., (A AGE)" () 


will form a basis of this subspace, that is, C; is a cyclic subspace corresponding to 
the principal vector m;. We obtain the fundamental theorem on Jordan form (Theo- 
rem 5.12). 


Let us recall that we proved Theorem 5.12 by induction on the dimension of the 
space. More precisely, for a linear transformation 4 on the space L, we constructed 
a subspace L’ invariant with respect to A of dimension | less and proved the theorem 
for L on the assumption that it had been proved already for L’. In fact, this meant 
that we constructed a sequence of nested subspaces 


L=lgD Li Dl25-:-Dby Dba = (0), (13.32) 


invariant with respect to A and such that dimL;,; = dimL; — 1. Then we reduced 
the proof of Theorem 5.12 for L to the proof of the theorem for L;, then for Lo, 
and so on. Now our first goal will be to construct in every finitely generated torsion 
module a sequence of submodules analogous to the sequence of subspaces (13.32). 


Lemma 13.41 In every finitely generated torsion module M over a Euclidean ring 
R, there exists a sequence of submodules 


M=Mo>M,D M2 D---D My; D M+) = {0} (13.33) 


13.4 Finitely Generated Torsion Modules over a Euclidean Ring* 491 


such that M; 4 Mj+1, Mj = (m;, Mj+1), where m; are elements of the module M, 
and for each of these, there exists a prime element p; of the ring R such that pjmj € 
Mi+1. 


Proof By the definition of a finitely generated module, there exists a finite number 
of generators m,,..., mm, € M such that the elements ajm,-+---+a,m, exhaust all 
the elements of the module M as aj, ..., a, run through all elements of the ring R. 
The collection of elements of the form agym,; + ---+ a;m,, where ax, ..., dy, are all 
possible elements of the ring R, obviously forms a submodule of the module M. Let 
us denote it by Mx. It is obvious that M;, D Mi , and M; = (mx, Me 1). Without 
loss of generality, we may assume that m, ¢ M11, since otherwise, the element 
m, can be excluded from among the generators. The constructed chain of submod- 
ules M;, is still not the chain of submodules M; that figures in Lemma 13.16. We 
obtain that chain from the chain of submodules M, by putting several intermediate 
submodules between the modules M;, and Muat. 

Since m; € M is a torsion element, there exists an element a € R for which 
am, = 0 and in particular, am; € Mua. Let a be an element of the ring R for 
which am, € Meat and g(a) assumes the smallest value among elements with this 
property. If the element a is prime, then we set p; = a, and then it is unnecessary to 
place a submodule between M;, and Meat. But if @ is not prime, then let p; be one 
of its prime divisors and a = pib. Let us set my) = bm; and Mii = (mg, My41)- 
Then clearly, pymx,; € Mx; and bm, € Mx,;. As we have seen, y(b) < g(q) (strict 
inequality). Therefore, repeating this process a finite number of times, we will place 
a finite number of submodules (13.33) with the required properties between M;, and 
Mr+1. 


Remark 13.42 It is possible to show that the length of every chain of the form 
(13.33) satisfying the conditions of Lemma 13.16 is the same number n. Moreover, 
every chain of submodules 


M=MjDM,DM25:::D Mn 


in which M; 4 M;+, has length m <n, and this holds with much milder restrictions 
on the ring R and module M than we have assumed in this chapter. What is of 
essence here is only that between any two neighboring submodules M; and Mj+1, 
there does not exist an “intermediate” submodule VM different from M; and M;+1 
such that Mj D M!D Mi+1. 

For example, let us consider an n-dimensional vector space L over a field K as 
a module over the ring R = K. Let aj,...,a, be some basis. Then the subspaces 
Li = (@j,...,@n), i =1,...,n, have the indicated property. Using this, we could 
give a definition of the dimension of a vector space without appealing to the notion 
of linear dependence. Thus the length n of all chains of the form (13.33) satisfying 
the conditions of Lemma 13.16 is the “correct” generalization of dimension of a 
space to finitely generated torsion modules. 


492 13 Groups, Rings, and Modules 


The following lemma is analogous to the one we used in the proof of Theo- 
rems 5.12 and 13.22. 


Lemma 13.43 [f the order of an element m of a module M is the power of a prime 
element, p"m = 0, and an element x of the cyclic submodule {m} is not divisible by 
Dp (that is, not representable in the form x = py, where y € M), then {m} = {x}. 


Proof It is obvious that {x} C {m}. Thus it remains to show that {m} Cc {x}, and 
for this, it suffices to ascertain that m € {x}. By assumption, x = am, where a is 
some element of the ring R. If a is divisible by p, then clearly, x is also divisible 
by p. Indeed, if a = pb with some b € R, then from the equality x = am, we obtain 
x = py, where y = bm, contradicting the assumption that x is not divisible by p. 
This means that a and p are relatively prime, and consequently, in view of the 
uniqueness of the decomposition into prime elements of the ring R, a is also rela- 
tively prime to p”. Then on the basis of the Euclidean algorithm, we can find ele- 
ments u and v in R such that au + p”v = 1. Multiplying both sides of this equality 
by m, we obtain that m = ux, which means that m € {x}. 


Lemma 13.44 Let M, be a submodule of the module M over a Euclidean ring 
R such that M = (m, M,) and M & M,. Then if for some a, p € R, we have the 
inclusions am € M, and pm € M,, where the element p is prime, then a is divisible 


by p. 


Proof Let us assume that a is not divisible by p. Since the element p is prime, 
we have (a, p) = 1, and from the Euclidean algorithm in the ring R, it follows that 
there exist two elements u,v € R for which au + pv = 1. Multiplying both sides 
of this equality by m, taking into account the inclusions am € M, and pmeé M,, 
we obtain that m € M,. By definition, (m, M1) consists of elements bm + m’ for all 
possible b € R and m’ € M;. Therefore, M = (m, M,) = M,, which contradicts the 
assumption of the lemma. 


Proof of Theorem 13.39 The proof is an almost verbatim repetition of the proof 
of Theorems 5.12 and 13.22. We may use induction on the length n of the chain 
(13.33), that is, we may assume the theorem to be true for the module M,. Let 


M=C(8---®C,, (13.34) 


where C; = {c;} are cyclic submodules, and the order of each element c; is the 
power of a prime element. By Lemma 13.16, M = (m, M,) and pm € M,, where p 
is a prime element. Then based on the decomposition (13.34), we have 


pm=Z+---+2-, 27EC;. (13.35) 


We shall select those elements z; that are divisible by p. By a change in numeration, 
we may assume that these are the first s — 1 terms. Let us set z; = pz, for i = 
1,...,s — 1. We must now consider two cases. 


13.4 Finitely Generated Torsion Modules over a Euclidean Ring* 493 


Case J: The number s — | is equal tor. Then pm = pm’, where m’ = 2) +--+ + Z}.. 
Let us set m — m' =m. It is obvious that pm = 0. We shall prove that the module 
M can be written in the form 


M={mOC ®---OC,. 


Indeed, by assumption, every element x € M can be represented in the form x = 
am + y, where a € R and y € M,, which means also in the form x = am + y’, 
where y’=am'+ yeM,. 

Let us prove that for two such representations 


x=am+y, x=amt+y’, (13.36) 
we have the equalities am = a'm and y = y’. From this it will follow that 
M ={m} ® M ={m} OC) ®---@C,, 


which in our case, is relationship (13.31). 

We obtain from equalities (13.36) that am = y, where ad =a—a’',y=y’'—y, 
and by assumption, y € M;. By Lemma 13.16, there exists a prime element p of the 
ring R such that pm € Mj, and this means that pm € M;. By Lemma 13.20, from 
the inclusions am € M, and pm € M,, it follows that the element a is divisible 
by p, that is, q = bp for some b € R. From this, we obviously obtain that am = 
b(pm) = 0. Consequently, am = a’m and y= y’. 


Case 2: The number s — | is less than r. If an element c; has order p and p; is 
not an associate of p, then pe. is not divisible by p, and therefore, every element of 
the module C; = {c;} is divisible p, by Lemma 13.17. Therefore, among the chosen 
s — | submodules C; are all those such that the order of the element c; is pe , and pj 
is not an associate of p. Since the order of an element is in general defined only up 
to replacing it by an associate, we may consider that in the remaining submodules 
C,; = {es}, ..., C, = {e,}, the order of the element c; is a power of p. 

By construction, in the decomposition (13.35), we have z; = pz. : z € Cj, for all 
i=1,...,s—1. Setting 2 +---+2z{_, =z’ and m— 2’ =m, we obtain the equality 


pit =z, +--+ +2p. (13.37) 


Since the order of the element c; fori =s,...,7 is a power of p, the order of an 
arbitrary element z; in the decomposition (13.37) is also a power of p. Let us denote 
it by p”. Obviously, we may choose the numeration of the terms in formula (13.37) 
in such a way that the numbers n; do not decrease: | < ny <ns4) <--:<n,.Letus 
prove that the order of the element m is equal to p’””*! and that we have the equality 


M={mj}@Ci @---OCs-1®-:- OC--1, 


that is, in the decomposition, all submodules C; occur other than C,. With this, 
relationship (13.31) will be proved in the second case as well; that is, the proof of 
Theorem 13.39 will be complete. 


494 13 Groups, Rings, and Modules 


Multiplying both sides of equality (13.37) by p”” and using the fact that p”"z; = 
0 for alli =s,...,r, we obtain that puttin = 0. If the order a of an element m 
is not an associate of pee, then it divides it, and is equal, up to an associate, to 
p* for some k <n, + 1. Multiplying relationship (13.37) by p*~! and using the 
fact that the submodules C;,..., C,; form a direct sum, we obtain that p* lz, =0 
for alli =s,...,r. In particular, p‘~'z, = 0, and this contradicts the assumption 
k <n, +1 and that the order of the element z, is equal to p””. Thus the order of the 
element m is equal to p”*!. 

Let us note that by construction, in the decomposition (13.37), the element Z,. is 
not divisible by p. 

From what we have proved, on the basis of Lemma 13.17, it follows that {z,} = 
{c,} = C,. From this it follows that every element m € M can be represented as a 
sum of elements of the modules 


{m},C1,...,Cs—1,..., Cp-1. (13.38) 
Indeed, an analogous assertion holds for the modules 
{mt}, Ci yee.5 Cpatoenny Cry (13.39) 


since by our construction, m = m — 2’ and z’ =z + --- + 2/_,, where 2) € Cj. 
Consequently, m =m +z +---+2z/_,, which means that every element m € M is 
a sum of elements of the modules (13.39). 

We now must verify that every element of the submodule C, can be represented 
as a sum of elements of the submodules (13.38). Since C, = {z;}, it suffices to verify 
this for a single element z,. But relationship (13.37) gives us precisely the required 
representation: 


Zp = pM — Z5 — +++ — Zp-1. 


It remains to verify the second condition entering into the definition of a direct sum: 
that such a representation is unique. To this end, it suffices to prove that in the 
relationship 


am+fyteorthfeypteoctSf,1=0, f;, eC, (13.40) 


all the terms must equal 0. 

Indeed, from relationship (13.40), taking into account (13.34), it follows that 
am € M,. But by the construction of the element m, we then also have am € Mj. 
By Lemma 13.20, from the inclusions am € M, and pm € M,, we have that the 
element a is divisible by p, that is, a = bp for some b € R. Furthermore, we know 
that 


pM=Z5+++-+2,, 


and moreover, the order of the element z; is p””, while the order of the element m7 is 
p+. On substituting all these relationships into decomposition (13.40), we obtain 


Digest Sp) fy rt far teers Sp =O. 


13.4 Finitely Generated Torsion Modules over a Euclidean Ring* 495 


Then it follows from formula (13.34) that bz, = 0, and since the order of the element 
Z, is equal to p””, we have that p”” divides b. This means that the element a is 
divisible by p”’*!, and am = 0. But then from equality (13.40), it follows that 
fi t::-+f,—1 = 0. Using again the induction hypothesis (13.34), we obtain that 
f, =09,..., f-_; = 0. This completes the proof of Theorem 13.39. 


For Theorem 13.39, we have the same uniqueness theorem as in the case of 
Theorem 5.12 and Theorem 13.22. Namely, if 


M=C\®::-®C,, C;={myj}, M=D,9::-®D,, D;={nj;} 


are two decompositions of finitely generated torsion modules M in which the orders 
of elements m; and nj; are prime powers, that is, p,m; = 0 and q? n; = 0, where 
pi and qj; are prime elements, then with a suitable numeration of the terms C; and 
Dj, elements p; and q; are associates, and r; = s;. However, a natural proof of this 
theorem would require some new concepts, and we shall not pursue this here. 


Chapter 14 
Elements of Representation Theory 


Representation theory is one of the most “applied” branches of algebra. It has many 
applications in various branches of mathematics and mathematical physics. In this 
chapter, we shall be concerned with the problem of finding all finite-dimensional 
representations of finite groups. But there is an analogous theory that has been devel- 
oped for certain types of infinite groups, which is important in many other branches 
of mathematics. 


14.1 Basic Concepts of Representation Theory 


Let us recall some definitions from the previous chapter that will play a key role 
here. 

A homomorphism of a group G into a group G’ is a mapping f : G > G’ such 
that for every pair of elements g1, g2 € G, we have the relationship 


Sf (g182) = f (91) f (g2)- 


An isomorphism of a group G onto a group G’ is a bijective homomorphism f : 
G — G’. Groups G and G’ are said to be isomorphic if there exists an isomorphism 
f :G— G' between them. This is denoted by G ~ G’. 


Definition 14.1 A representation of a group G is a homomorphism of G into the 
group of nonsingular linear transformations of a vector space L. The space L is called 
the space of the representation or the representation space, and its dimension, that 
is, dimL, is the dimension of the representation. 


Thus in order to specify a representation of a group G, it is necessary to associate 
with each element g € G a nonsingular linear transformation A, :L— L such that 
for g1, g2 € G, the condition 


Argigs = Ag, Ags (14.1) 


LR. Shafarevich, A.O. Remizov, Linear Algebra and Geometry, 497 
DOI 10.1007/978-3-642-30994-6_14, © Springer-Verlag Berlin Heidelberg 2013 


498 14 Elements of Representation Theory 


is satisfied. Since the group of nonsingular linear transformations of an n- 
dimensional vector space is isomorphic to the group of nonsingular square matrices 
of order n, to give a representation, it suffices to associate with each element g € G 
a nonsingular square matrix A, such that (14.1) is satisfied. 

It follows at once from (14.1) that for a representation A, and any number of 
elements g1,..., gx of the group G, we have the relationship 


Agi gy = Ag, ++ Ag. (14.2) 
Moreover, it is obvious that if e is the identity element of G, then 


A, = &, (14.3) 


1 


where & is the identity linear transformation of the space L. And if g~* is the inverse 


of the element g, then 


A,-1 = A! 


ig es (14.4) 


that is, “A go} is the transformation that is the inverse of A,. 
Example 14.2 Let G = GL, be the group of nonsingular square matrices of order n. 
For each matrix g € GL», let us set 

Ag = |gl. 


Since |g| is a number, which by assumption is different from zero, we have a one- 
dimensional representation. It is obvious that for every integer n, the equality 


By =\g\" 
will also define a one-dimensional representation. 
Example 14.3 Let G = S, be the symmetric group of degree n, that is, the group of 


permutations of an n-element set M, and let L be a vector space of dimension n, in 
which we have chosen a basis e1,..., €,. For the representation 


=(} De sade :) 
g W te #% Fal? 


let us define Ag as the linear transformation such that 
Ag(€1) = ej, Ag(€2) = ej, hats Ag(€n) =e j,- 


Then we obtain an n-dimensional representation of the group S),. 

To avoid having to use a specific numeration of the elements of the set M, let 
us associate with the element a € M, the basis vector e,. Then the representation 
described above is given by the formula 


Ag(€a)=ep if g(a)=b, 


for every transformation g: M—> M. 


14.1 Basic Concepts of Representation Theory 499 


Example 14.4 Let G = 83 be the symmetric group of degree 3, and let L be a two- 
dimensional space with basis e;, e2. Let us define a vector e3 by e3 = —(e1 + e2). 


For the representation 
_ ( t 2 2 ) 
BN bk By 


let us define A g as the transformation such that 
Ag(€1) = ej, Ag (€2) = @ jy. 


It is easily verified that in this way, we obtain a two-dimensional representation of 
the symmetric group $3. 


Example 14.5 Let G = GLz be the group of nonsingular matrices of order 2, and 
let L be the space of polynomials in the two variables x and y whose total degree in 
both variables does not exceed n. For a nonsingular matrix 


_(a b 
Ne a}? 


let us define “A, as the linear transformation of the space L taking polynomials 
f (x, y) to f (ax + by, cx + dy), that is, 


Ag( f(x, y)) = f(ax + by, cx + dy). 


It is easy to verify that relationship (14.1) is satisfied in this case, that is, we have 
a representation of the group of nonsingular matrices of order 2. Its dimension is 
equal to the dimension of the space of polynomials in x and y whose dimension (in 
both variables combined) does not exceed n; that is, as is easily seen, it is equal to 
(nt In +2)/2. 


Example 14.6 For any group and an n-dimensional space L, the representation de- 
fined by the formula A, = &, where & is the identity transformation on the space L, 
is called the n-dimensional identity representation. 


In the definition of a representation, the space L can also be infinite-dimensional. 
In this case, the representation is also said to be infinite-dimensional. For example, 
defining a representation just as in Example 14.5, but taking for L the space of all 
continuous functions, we obtain an infinite-dimensional representation. In the se- 
quel, we shall consider only finite-dimensional representations, and we shall always 
consider the space L to be complex. 


Example 14.7 Representations of the symmetric group S, are of interest in many 
problems. All such representations are known, but we shall describe here only the 
one-dimensional representations of the group S,. In this case, a nonsingular linear 
transformation Ag is given by a matrix of order 1, that is, a single complex number 
(which, of course, is nonzero). We thereby arrive at a function on the group taking 


500 14 Elements of Representation Theory 


numeric values. Let us denote this function by g(g). Then by definition, it must 
satisfy the conditions g(g) 4 0 and 


g(gh) = o(g)g(h) (14.5) 


for all elements g and h in the group S;,. 

It is easy to find all possible values g(t) if t is a transposition. Namely, setting 
g=h=r and using the facts that t* = e (the identity transformation) and that 
obviously, g(e) = 1, we obtain from relationship (14.5) the equality g(t)? = 1, from 
which follows g(t) = +1. It is theoretically possible that for some transpositions, 
g(t) = 1, while for others, g(t) = —1. However, in reality, such is not the case, and 
one of the equalities g(t) = 1 and g(t) = —1 holds for all transpositions t, with 
the choice of sign depending only on the one-dimensional representation g. Let us 
prove this. 

Let t = Tg, and t’ = T¢.q be two transpositions, where a, b, c,d are elements of 
the set M (see formula (13.3)). Obviously, there exists a permutation g of the set M 
such that g(c) =a and g(d) = b. Then as is easily verified, based on the definition 
of a transposition, we have the equality g~!tg.,g = tea, that is, t’ = g~'tg. In 
view of relationships (14.2), (14.4), and (14.5), we obtain from the last equality that 


g(t’) = 9(g) '(t)9(g) = ¥(t), 


which proves our assertion for all transpositions t and t’. We shall now make use 
of the fact that every element g of the group S,, is the product of a finite number 
of transpositions; see formula (13.4). Taking the aforesaid into account, it follows 
from this that 


9(g) = P (Tay ,b1 )P (Tar, bo) og P (Tay, by) = g(t), (14.6) 


where g(t) = +1 or —1. 

Thus there are two possible cases. The first case is that for all transpositions 
t € S,, the number g(r) is equal to |. In view of formula (14.6), for every transpo- 
sition g € S,, we have g(g) = 1, that is, the function g on S,, is identically equal to 
1, and therefore, it gives the one-dimensional identity representation of the group S,. 
The second case is that for all transpositions t € S,, we have g(t) = —1. Then, in 
view of formula (14.6), for a transposition g € S,, we have g(g) = (—1)*, where k 
corresponds to the parity of the transposition g. In other words, g(g) = 1 if the trans- 
position g is even, and g(g) = —1 if the transposition g is odd. From relationship 
(13.4), it follows at once that such a function g indeed determines a one-dimensional 
representation of the group S,,, which we denote by e(g). 

Thus we have obtained the following result: the symmetric group Sp has exactly 
two one-dimensional representations: the identity and (g). 


One-dimensional representations of the group S, and related groups (such as the 
alternating group A,,) play a large role in a variety of questions in algebra. For ex- 
ample, one of the best-known results in algebra is the derivation of formulas for 
the solution of equations of degrees 3 and 4. For a long time, mathematicians were 
thwarted in their attempts to find analogous formulas for equations of degree 5 and 


14.1 Basic Concepts of Representation Theory 501 


higher. Finally, it was proved that such an attempt was futile, that is, that there exists 
no formula that expresses the roots of a polynomial equation of degree 5 or greater 
in terms of its coefficients using the usual arithmetic operations and the extraction 
of roots of arbitrary degree. A key point in the proof of this assertion was the estab- 
lishment of the fact that the alternating group A, for n > 5 has no one-dimensional 
representation other than the identity. For n = 3 and 4, such representations of the 
group A, exist, and that is what explains the existence of formulas for the solution 
of equations of those degrees. 
Now let us establish what representations we shall consider to be identical. 


Definition 14.8 Two representations g +> A, and g b> Ay of the same group G 
with spaces L and L’ of the same dimension are said to be equivalent if there exists 
an isomorphism @ : L’ — L of the vector spaces L’ and L such that 


Al, = C7! Age (14.7) 


for every element g € G. 


Let e/,...,e), be a basis of the space L’ and let e; = C(e}), ..., en = C(e!,) be 
the corresponding basis of the space L, since the linear transformation @ : L’!’ > L 
is an isomorphism. Comparing relationship (14.7) with the change-of-matrix for- 
mula (3.43), we see that this definition means that the matrix of the transformation 
Ay with basis e),...,e!, coincides with the matrix of the transformation A, with 
basis e),...,@,. Thus the representations A, and Ae are equivalent if and only if 
one can choose bases in the spaces L and L’ such that for each element g € G, the 
transformations A, :L— Land A’, : L’ + L’ have identical matrices. 

Let gt> Ag be a representation of the group G, and let L be its representation 
space. A subspace M C Lis said to be invariant with respect to the representation Ag 
if it is invariant with respect to all linear transformations A, :L— L for all g € G. 
Let us denote by B, the restriction of A, to the subspace M. It is obvious that By 
is a representation of the group G with representation space M. The representation 
Bz is said to be the representation induced by the representation A, with invariant 
subspace M. This is also expressed by saying that the representation By is contained 
in the representation Ag. 


Example 14.9 Let us consider the n-dimensional representation of the group S, 
described in Example 14.3. As is easily verified, the collection of all vectors of the 
form >> acM @a€a, Where oq is an arbitrary scalar satisfying acm a = 0, forms 
a subspace L’ C L of dimension n — 1, invariant with respect to this representation. 
The representation thus induced in L’ is an (n — 1)-dimensional representation of 
the group S;,. In the case n = 3, it is equivalent to the representation of the group 53 
described in Example 14.4. 


Example 14.10 In Example 14.5, let us denote by My (k =0,...,) the subspace 
consisting of polynomials of degree at most k in the variables x and y. Each of Mx 
is an invariant subspace of every M; with index / > k. 


502 14 Elements of Representation Theory 


Definition 14.11 A representation is said to be reducible if its representation space 
L has an invariant subspace different from (0) and from all of L. Otherwise, it is said 
to be irreducible. 


Examples 14.3 and 14.5 give reducible representations. Clearly, the n-dimen- 
sional identity representation is reducible if n > 1: every subspace of the represen- 
tation space is invariant. Every one-dimensional representation is irreducible. 

Let us prove that the representation in Example 14.4 is irreducible. Indeed, any 
invariant subspace different from (0) and L must be one-dimensional. Let u be a 
basis vector of such a subspace. The condition of invariance means that 


Agu) =Agu 


for every g € S3, where A, is some scalar depending on the element g, that is, u 
is a common eigenvector for all transformations Ag. It is easy to verify that this is 
impossible: the eigenvectors of the transformation A,, with g; = ( ; 4) have the 
form a(e; +2) and B(e; — e2), and the eigenvectors of the transformation A,, with 


L2= (3 ; : ) have the form yez and 6(2e; + e2), and these clearly cannot coincide. 


Definition 14.12 A representation “g is said to be the direct sum of the r represen- 
tations 


d) (r) 
Ags ssey hg 
if its representation space L is the direct sum of the r invariant subspaces 
L=lL,6-:-@L,, (14.8) 
and “A, induces in every L; a representation equivalent to AY, ga eee ae 


Example 14.13 The n-dimensional identity representation is the direct sum of n 
one-dimensional identity representations. To convince oneself of this, it suffices to 
decompose the space of this representation in some way into a direct sum of one- 
dimensional subspaces. 


Example 14.14 In the situation of Example 14.9, let us denote by L; an invariant 
subspace L’ of dimension n — 1, and let us denote by Lz the one-dimensional sub- 
space spanned by the vector }> <j, @a- Clearly, Lz is also an invariant subspace 
of this representation, and we have the decomposition L = L; @ Ly. In particular, 
the representation introduced in Example 14.3, for n = 3, is the direct sum of the 
representation of Example 14.4 and the one-dimensional identity representation. 

It can happen that the representation space L has an invariant subspace Ly, yet it 
is impossible to find a complementary invariant subspace Lz such that L= Lj @ Lo. 
In other words, the representation is reducible, but it is not the direct sum of two 
other representations. 


Example 14.15 Let G = {g} be an infinite cyclic group, and let L be a two- 
dimensional space with basis e;, e2. Let us denote by A, the transformation having 


14.2 Representations of Finite Groups 503 


in this basis the matrix ' oe It is obvious that A, Am = An+m- From this, it fol- 
lows that on setting “Agr = “A, we obtain a representation of the group G. The line 
L; = (e2) is an invariant subspace: “A, (e2) = e2. However, there are no other invari- 
ant subspaces. Thus, for instance, the transformation 4; has no eigenvectors other 
than e2. Therefore, our representation is reducible, but it is not a direct sum. 


Let us note that in Example 14.15, the group G was infinite. It turns out that for 
finite groups, such a phenomenon cannot occur. Namely, in the following section, 
it will be proved that if a representation Ag of a finite group is reducible, that is, 
the vector space L of this representation contains an invariant subspace L;, then 
L is the direct sum of L; and another invariant subspace L2. Hence it follows that 
every representation of a finite group is the direct sum of irreducible representations. 
As regards irreducible representations, it will be proved in Sect. 14.3 that (up to 
equivalence) there is only of finite number of them. 

From this point on, to the end of this book, we shall always assume that a group 
G is finite, with the sole exception of Example 14.36. 


14.2 Representations of Finite Groups 


The proof of the fundamental property of representations of finite groups formulated 
at the end of the preceding section uses several properties of complex vector spaces. 

Let us consider a representation of a finite group G. Let L be its representation 
space. Let us define on L some Hermitian form g(x, y) for which the correspond- 
ing quadratic-Hermitian form y(x) = g(x, x) is positive definite, and thus it takes 
positive values for all x 4 0. For example, if L=(C”, then for vectors x and y with 
coordinates (x1,...,%,) and (y1,..., Yn), let us set 


n 
g(x, y) =) xy}. 
i=l 
In the sequel, we shall denote g(x, y) by (x, y) and call it a scalar product in the 
space L. The concepts and simple results that we proved in Chap. 7 for Euclidean 
spaces can be transferred to this case verbatim. Let us list those of them that we are 
now going to use: 


1. The orthogonal complement of a subspace L’ C L is the collection of all vec- 
tors y € L for which (x, y) = 0 for all x € L’. The orthogonal complement of 
a subspace L’ is itself a subspace of L and is denoted by (L’)+. We have the 
decomposition L = L’ @ (L’)+. 

2. A unitary transformation (the analogue of orthogonal transformation for the case 
of a complex space) is a linear transformation U : L — L such that for all vectors 
x, y €L, we have the relationship 


(U(x), U(y)) = (x, y). 


504 14 Elements of Representation Theory 


3. The complex analogue of Theorem 7.24 is this: if a subspace L’ C L is invariant 
with respect to a unitary transformation U, then its orthogonal complement (L’)+ 
is also invariant with respect to U. 


Definition 14.16 A representation U, of a group G is said to be unitarizable if it 
is possible to introduce a scalar product on its representation space L such that all 
transformations U, become unitary. 


The property of a representation being unitarizable obviously remains true under 
a change to an equivalent representation. 

Indeed, let g +> Ug be a unitarizable representation of some group G with space 
L and Hermitian form v(x, y). Let us consider an arbitrary isomorphism C : L’ > L. 
As we know, it determines an equivalent representation g b> U, of the same group 
with space L’. Let us show that the representation g t> U, is also unitarizable. As 
the scalar product in L’ let us choose the form defined by the relationship 


wu, v) = 9(C(u), C(v)) (14.9) 


for vectors u,v € L’. It is obvious that y(u, v) is a Hermitian form on L’ and that 
w(u,u) > 0 for every nonnull vector u € L’. Let us verify that the scalar product 
w (u, v) indeed establishes the unitarizability of the representation g t> Us Substi- 
tuting the vectors U, (uw) and U, (v) into equality (14.9), taking into account (14.7) 
and the unitarizability of the representation g +> Ug, we obtain the relationship 


p(U,(u), U,(v)) = ¥ (C1 U,C), C1UyCW)) 
= 9(Ugl(u), UgC(v)) = (CH), C(v)) = W@, »), 
which means that the representation g b> U, is unitarizable. 
Lemma 14.17 /f a space L of a unitarizable representation U, of a group G con- 


tains an invariant subspace L’, then it also contains a second invariant subspace L’ 
such thatL=U @®L". 


Proof Let us take as L” the orthogonal complement (L’)+. Then the space L” is 
invariant with respect to all transformations U,, and we have the decomposition 
L=U OL’. 


The application of this lemma to representations of finite groups is based on the 
following fundamental fact. 


Theorem 14.18 Every representation Ag of a finite group G is unitarizable. 
Proof Let us introduce a scalar product on the representation space L in such a way 


that all linear transformations A, become unitary. For this, let us take an arbitrary 
scalar product [x, y] in the space L, defined by an arbitrary Hermitian form g(x, y), 


14.2 Representations of Finite Groups 505 


such that the associated quadratic form g(x, x) is positive definite: g(x, x) > 0 for 
every x #0. Let us now set 


(x,y) = Do [Ag (x), Ag], (14.10) 
gEG 
where the sum is taken over all elements g of the group G. We shall prove that 
(x, y) is also a scalar product and that with respect to it, all transformations A, are 
unitary. 
The required properties of a scalar product for (x, y) derive from the analogous 
properties of [x, y] and from the fact that “A, is a linear transformation: 


1. (yx) = Do [Ag(y), Age)] = Do [Ag(x), Ag(y)] = &, 9), 


geG gcG 
2. (Ax, y) = [re (Ax), Ag(y)] = >> Ale (x), e(y)] =e, y), 
geG geG 
3. (x1 +x2,y)= D [Ag(x1 +22), Ae(y)] 
gEG 
= \ > [Ag (x1) + Ag (x2), Ag (y)] = (1, y) + (2. 9), 
gEG 
4.(x,x) =) [Ag(x), Ag(x)] >0, ifx £0. 
geG 


For the proof of the last property, it is necessary to observe that in this sum, all 
terms [A g(x), Ag (x)] are positive. This follows from the analogous property of the 
scalar product [x, y], that is, from the fact that [x,x] > 0 for all x 40. Since the 
linear transformation Ag : L— Lis nonsingular, it takes every nonnull vector x to a 
nonnull vector Ag (x). 

Let us now verify that with respect to the scalar product (x, y), every transfor- 
mation Aj, h € G, is unitary. In view of (14.10), we have 


(An(x), An(y)) = Do [Ae(An(X)), Ae (An(y))] 
gEG 
=) [Ag An(x), AgAn(y)]- (14.11) 
gEG 
Let us set gh = u. In view of property (14.1), we have Ag An = Agh = A,. There- 
fore, we may rewrite equality (14.11) in the form 


(An(x), An(y)) = D> [Au (x), u(y]. (14.12) 


u=gh 


Let us now observe that as g runs through all elements of the group G while h 
is fixed, the element u = gh also runs through all elements of the group G. This 
follows from the fact that for every element u € G, the element g = uh™! satisfies 
the relationship gh = u, and that for distinct g; and go, we thereby obtain distinct 
elements uw, and uw. 


506 14 Elements of Representation Theory 


Thus in equality (14.12), the element u runs through the entire group G, and we 
can rewrite this equality in the form 


(An(x), An(Y)) = Do [Ae (x), Ag(y)], 
geG 


whence in view of definition (14.10), it follows that (An (x), An(y)) = (x, y), that 
is, the transformation A; is unitary with respect to the scalar product (x, y). 


Corollary 14.19 If the space L of a representation of a finite group contains an 
invariant subspace L’, then it contains another invariant subspace L” such that L= 
L’ @ L". 


This follows directly from Lemma 14.17 and from Theorem 14.18. 


Corollary 14.20 Every representation of a finite group is a direct sum of irreducible 
representations. 


Proof If the space L of our representation A, does not have an invariant subspace 
different from (0) and all of L, then this representation itself is irreducible, and our 
assertion is true (although trivially so). But if the space L has an invariant subspace 
L’, then by Corollary 14.19, there exists an invariant subspace L” such that L = 
L’ a L’. 

Let us apply the same argument to each of the spaces L’ and L”. Continuing this 
process, we will eventually come to a halt, since the dimensions of the obtained 
subspaces are continually decreasing. As a result, we arrive at such a decomposi- 
tion (14.8) with some number + > 2 such that the invariant subspaces L; contain 
no invariant subspaces other than (0) and all of L;. This means precisely that the 
representations AD, ncpats AY? induced in the subspaces L,,...,L, by our represen- 
tation Ag are irreducible, and the representation A, decomposes as a direct sum 


WO AO, 


Theorem 14.21 [fa representation Ag decomposes into a direct sum of irreducible 
representations AM ey Ae, then every irreducible representation Bg contained 


in Ag is equivalent to one of the rie 


Proof Let L=L; ®--- @L, be a decomposition of the space L of the represen- 
tation A, into a direct sum of invariant subspaces such that A, induces in L; the 
representation Ao , and let M be the invariant subspace L in which A, induces the 
representation Bg. 

Then in particular, for every vector x € M, we have the decomposition 


xX=x,;+---+x,, x, EL. (14.13) 


It determines a linear transformation FP; : M— L; that is the projection of the sub- 
space M onto L; parallel to L; ®--- ®Li-1 @Li+1 @--- PL; see Example 3.51 on 


14.2 Representations of Finite Groups 507 


p. 103. In other words, the transformations P; : M— L; are defined by the condi- 
tions 


Pi(x)=x;, i=1,...,7. (14.14) 
The proof of the theorem is based on the relationships 
AgPi (x)= PiAg(x), i=l,....7, (14.15) 


which are valid for every vector x € M. For the proof of relationships (14.15), let us 
apply the transformation A, to both sides of equality (14.13). We then obtain 


Ag (X) = hg (X1) ++ + Ag (Xr). (14.16) 


Since Ag(x) € Mand A,(x;) €L;,i=1,...,r, it follows that relationship (14.16) 
is decomposition (14.13) for the vector Ag(x), whence follows equality (14.15). 


From the irreducibility of the representations Ae’, ee AY and Bg, it follows 
that the projection ; defined by formula (14.14) is either identically zero or an 
isomorphism of the spaces M and L,;. Indeed, let the vector x € M be contained in 
the kernel of the transformation #;, that is, P;(x) = 0. Then clearly, Ag P(x) = 
0, and in view of relationship (14.15), we obtain that Pj A,(x) = 0, that is, the 
vector (x) is also contained in the kernel of #;. From the irreducibility of the 


representations At ) it now follows that the kernel either is equal to (0) or coincides 
with the entire space M (in the latter case, the projection ; will obviously be the null 
transformation). In exactly the same way, from equality (14.15), it follows that the 
image of the transformation P; either equals (0) or coincides with the subspace L;. 

However, there is certainly at least one such index i among the numbers 1,...,7 
for which the transformation #; is not identically zero. For this, we must take an 
arbitrary nonnull vector x € M one of whose components x; in the decomposition 
(14.13) is not equal to zero, and therefore, P; (x) 4 0. Taking into account the pre- 
vious arguments, this shows that the corresponding transformation P; is an isomor- 
phism of the vector spaces M and L;, and relationship (14.15) shows the equivalence 


of the corresponding representations B, and AY, 


Corollary 14.22 In a given representation are contained only finitely many 
distinct—in the sense of equivalence—irreducible representations. 


Indeed, all irreducible representations contained in the given one are equivalent 
to one of those encountered in an arbitrary decomposition of this representation as 
a direct sum of irreducible representations. 


Remark 14.23 From Theorem 14.21 there follows a certain property of uniqueness 
of the decompositions of a representation into irreducible representations. Namely, 
however we decompose a representation, we shall encounter in the decomposition 
the same (up to equivalence) irreducible representations. Indeed, let us select a cer- 
tain decomposition of our representation into irreducible representations. An irre- 
ducible representation encountered in any other decomposition appears in our rep- 
resentation, which means that by Theorem 14.21, it is equivalent to one of the terms 


508 14 Elements of Representation Theory 


of the chosen decomposition. A stronger property of uniqueness consists in the fact 
that if in one decomposition there appear k terms equivalent to a given irreducible 
representation, then the same number of such terms will appear as well in every 
other decomposition. We shall not require this assertion in the sequel, and we shall 
therefore not prove it. 


14.3 Irreducible Representations 


In this section, we shall prove that a finite group has only a finite number of distinct 
(up to equivalence) irreducible representations. We shall accomplish this as follows: 
We shall construct one particularly important representation called a regular rep- 
resentation, for which we then shall prove that every irreducible representation is 
contained within it. The finiteness of the number of such representations will then 
result from Corollary 14.22. The space of a regular representation consists of all 
possible functions on the group. This is a special case of the general notion of the 
space of functions on an arbitrary set (see Example 3.36, p. 94). 

For an arbitrary finite group G, let us consider the vector space M(G) of functions 
on this group. Since the group G is finite, the space M(G) has finite dimension: 
dim M(G) = |G]. 


Definition 14.24 The regular representation of a group G is the representation Ry, 
whose representation space is the space M(G) of functions on the group G, and in 
which the element g € G is associated with the linear transformation R, that takes 
the function f(h) € M(G) to the function g(h) = f (hg): 


(Re(f))(h) = flag). (14.17) 


Formula (14.17) means that the result of applying the linear transformation Rg 
to the function f is a “translated” function f, in the sense that the value Ry(f) on 
the element h € G is equal to f (hg). We shall omit the obvious verification of the 
fact that the transformation of the space M(G) thus obtained is linear. Let us verify 
that Ry, is a representation, that is, that it satisfies the requirements (14.1). 

Let us set Rg, o,(f) = g. By formula (14.17), we have 


pth) = f (hgigz). 
Let Re, (f) =v. Then 
w(u) = f (ug). 


Finally, if Re, Re, (f) = G1, then g] = Kg, (Ww) and gy) (u) = (ug). Substituting 
u = hg} into the previous formula, we obtain that 9) (vu) = w(ug1) = f (ugig2) for 
every element u € G. This means that g = ¢1 and Reig, = Rg, Rg. 


Example 14.25 Let G be a group of order two, consisting of elements e and g, 
where g? = e. A particular instance of this group is S2, the symmetric group of 


14.3 Irreducible Representations 509 


degree 2. The space M(G) is two-dimensional, and every function f € M(G) is 
defined by two numbers, a = f(e) and 6 = f(g), that is, it can be identified with 
the vector (a, 8). As with any representation, R- is the identity transformation. Let 
us determine what R, is. By formula (14.17), we have 


(Re(M)C)=f(g)=B, (Rel f))(g) = f(g”) = fe) =a. 


This means that the linear transformation R, takes the vector (a, 6) to the vector 
(6, a), that is, it represents a reflection with respect to the line a = B. 


Theorem 14.26 Every irreducible representation of a finite group G is contained 
in its regular representation Rg. 


Proof Let Ag be an irreducible representation with space L. Let us denote by / an 
arbitrary nonnull linear function on the space L and let us associate with each vector 
x €L the function f(t) = /(An(x)) € M(G) obtained when the vector x is fixed 
and the element / runs through all possible values of the group G. It is obvious that 
in this way, we obtain a linear transformation C : L—> M’ defined by the relationship 


C(x) =1(An(x)), (14.18) 


where M’ is some subspace of the vector space M(G). Here by construction, @(L) = 
M’, that is, M’ is the image of the transformation C. 
We shall prove the following properties: 


(1) For all elements g € G and vectors x € L, we have the relationship 
(CAg)(xX) = (RgC)(x). (14.19) 


(2) The subspace M’ is invariant with respect to the representation Rg. 
(3) The transformation @ is an isomorphism of the spaces L and M’. 


Comparing formulas (14.19) and (14.7), taking into account the remaining two 
properties, we conclude that the irreducible representation A, is equivalent to the 
representation induced by the regular representation R, in the invariant subspace 
M’ C M(G). By virtue of the definitions given above, this means that “A, is contained 
in Rg, as asserted in the statement of the theorem. 


Proof of property (1). Let us set C(x) = f € M(G). Then by definition, f(h) = 
l(An(x)) for every element h € G. Applying formula (14.17), we obtain the rela- 
tionship 


(ReC)(xX) = Re(f) =, (14.20) 


where ¢ is the function on the group G defined by the relationship g(h) = 
I(Ang(x)). 

On the other hand, substituting the vector A,(x) for x in formula (14.18), we 
obtain the equality 


C(Ag(x)) = (CAg) (x) = G1(h), (14.21) 


510 14 Elements of Representation Theory 


where the function gj (/) is defined by the relationship 
gi (h) =1(AnAg(x)) =l(Ang(x)), 


and clearly, it coincides with g(h). Taking into account that p(h) = 9) (h), we see 
that equalities (14.20) and (14.21) yield that (C-Ag)(x) = (RgC)(x). 


Proof of property (2). We must prove that for every element g € G, the image of the 
linear transformation R,(M’) is contained in M’. Let f € M’, that is, by the definition 
of the image, f = C(x) for some x € L. Then taking into account formula (14.19) 
proved above, we have the equality 


Rolf) = (RgE)(X) = (CAg)(x) = C(y), 


where the vector y = A,(x) is in L, and by our construction, this means that 
Rg(f) € M’. This proves the required inclusion R,(M’) C M’. 


Proof of property (3). Since by construction, the space M’ is the image of the trans- 
formation C : L—> M’, it remains only to show that the transformation C is bijective, 
that is, that its kernel is equal to (0). This means that we must prove that the equality 
x = 0 follows from the equality @(x) = 0’ (where 0’ denotes the function identically 
equal to zero on the group G). Let us denote the kernel of the transformation C by 
L’. As we know, it is a subspace of L. Let us show that L’ is invariant with respect to 
the representation Ag. 

Indeed, let us suppose that C(x) = 0’ for some vector x € L, and let us set 
y = Ag(x). On applying the transformation C to the vector y, taking into account 
formula (14.19), we obtain 

C(y)= (CAg(x)) = (ReC)(x) = Re (C(x)) =Re (0') =, 

But from the irreducibility of the representation A,, it now follows that either L’ = L 
or L’ = (0). The former would mean that /(.A;,(x)) = 0 for all h € G and x €L. But 
then even for ) = e, we would have the equality /(A¢(x)) =1(€(x)) =1(x) = 0 for 
all x € L, which is impossible, since in the definition of the transformation C, the 
function / was chosen to be not identically zero. This means that the subspace L’ is 
equal to (0), which is what was to be proved. 


Corollary 14.27 A finite group has only a finite number of distinct (up to equiva- 
lence) irreducible representations. 


Example 14.28 Let Ag be the one-dimensional identity representation of the 
group G. Then the space L is one-dimensional. Let e be a basis of L. Let us de- 
fine the function / by the condition /(we) = a. Formula (14.18) gives for the vector 
x = ae, the value 


C(ae)= f, where f(h) = I(An (we)) =I(awe) =a. 
Thus to the vector we is associated the function f, which takes for all h € G the 
same value a. Obviously, such constant functions indeed form an invariant subspace 


with respect to the regular representation, and the representation induced in it is the 
identity, as asserted by Theorem 14.26. 


14.4 Representations of Abelian Groups S11 
14.4 Representations of Abelian Groups 


Let us first of all recall that we are assuming throughout that the space L of a repre- 
sentation is complex. 


Theorem 14.29 An irreducible representation of an abelian group is one-dimen- 
sional. 


Proof Let g be a fixed element of the group G. Its associated linear transformation 
“A, :L— L has at least one eigenvalue 4. Let M C L be the eigensubspace corre- 
sponding to the eigenvalue 1, that is, the collection of all vectors x € L such that 


Ag (x) = Ax. (14.22) 


By construction, M # (0). We shall now prove that M is an invariant subspace of our 
representation. It will then follow from the irreducibility of the representation that 
M =L, and then equality (14.22) will hold for every vector x € L. In other words, 
Ag = 16, and the matrix of the transformation A, is equal to AE. A matrix of this 
type is called a scalar matrix. This reasoning holds for every g € G; we have only 
to note that the eigenvalue A in formula (14.22) depends on the element g, and the 
remainder of the argument does not depend on it. Thus we may conclude that the 
matrices of all transformations A, are scalar matrices, and if dimL > 1, then every 
subspace of the space L is invariant. Consequently, if a representation is irreducible, 
it is one-dimensional. 

It remains to prove the invariance of the subspace M. It is here that we shall 
specifically use the commutativity of the group G. Let x e€ M, h € G. We shall 
prove that Ap,(x) € M. Indeed, if A, (x) = y, then 


Ag(Y) = Ag(An(X)) = Agh(X) = Ang (X) = An(Ag(x)) = An (Ax) 
= AAj (x) = AY, 


that is, the vector y belongs to M. 


In view of Theorem 14.29, every irreducible representation of an abelian group 
can be represented in the form A, = x(g), where x(g) is a number. Condition 
(14.1) can then be written in the following form: 


x (8182) = X(g1) x (g2). (14.23) 


Definition 14.30 A function x(g) on an abelian group G taking complex values 
and satisfying relationship (14.23) is called a character. 


By Theorem 14.29, every irreducible representation of a finite abelian group is 
a character x(g). On the other hand, it follows from Theorem 14.26 that this rep- 
resentation is contained in the regular representation. In other words, in the space 
M(G) of functions on the group G, there exists an invariant subspace M’ in which 


512 14 Elements of Representation Theory 


the regular representation induces a representation equivalent to ours. Since our rep- 
resentation is one-dimensional, the subspace M’ is also one-dimensional. Let some 
function f € M(G) be a basis in M’. Then since the representation induced by the 
regular representation in M’ has matrix x(g), and R e(f)(A) = f (hg), we must have 
the relationship 


f (hg) = x(g) f(A). 


Let us set h = e in this equality and let us also set f(e) = a. We obtain that f(g) = 
ax(g), that is, we may take as a basis of the subspace M’ the character x itself 
(indeed, it is a function on G, and this means that x € M(G)). As we have seen, 
we then have M(G) = M’ @ M”, where M” is also an invariant subspace. Applying 
analogous arguments to M” and to all invariant subspaces of dimension greater than 
1 that we obtain along the way, we finally arrive at a decomposition of the subspace 
M(G) as a direct sum of one-dimensional invariant subspaces. We have thereby 
proved the following result. 


Theorem 14.31 The space M(G) of functions on a finite abelian group G can be 
decomposed as a direct sum of one-dimensional subspaces that are invariant with 
respect to the regular representation. In each such subspace, one can take as a basis 
vector some character x(g). Then the matrix of the representation that is induced 
in this subspace coincides with this same character x (g). 


It is obvious that we thereby establish a bijective relationship between the char- 
acters of the group G and one-dimensional invariant subspaces of the space M(G) 
of functions on this group. Indeed, two distinct characters x; and x2 cannot be basis 
vectors of one and the same representation: that would mean that 


x1(g) =ax2(g) forallgeG. 


Setting here g = e, we obtain aw = 1, since x; and x2 are homomorphisms of the 
group G into C, and therefore, x; (e) = x2(e) = 1. 

Since by Corollary 14.19, a regular representation can be decomposed into a 
direct sum of irreducible representations, we obtain the following results for every 
finite abelian group G. 


Corollary 14.32 The characters form a basis of the space M(G) of functions on the 
group G. 


This assertion can be reformulated as follows. 


Corollary 14.33 The number of distinct characters of a group G is equal to its 
order. 


This follows from Corollary 14.32 and the fact that the dimension of the space 
M(G) is equal to the order of the group G. 


14.4 Representations of Abelian Groups 513 


Corollary 14.34 Every function on the group G is a linear combination of charac- 
ters. 


Example 14.35 Let G = {g} be a cyclic group of finite order n, g” = e. Let us 
denote by 0, ..., &;—1 the distinct nth roots of 1, and let us set 
vie )=e, £=0, 1.0.01. 

It is easily verified that x; is a character of the group G and that the characters x; 
corresponding to &;, the distinct nth roots of 1, are themselves distinct. Since their 
number is equal to |G|, they must be all the characters of the group G. By Corol- 
lary 14.32, they form a basis of the space M(G). In other words, in an n-dimensional 
space, the vectors 1, &),..., eT corresponding to the nth roots of 1 form a basis. 
This can also be verified directly by calculating the determinant consisting of the 
coordinates of these vectors as a Vandermonde determinant (p. 41). 


Example 14.36 Let us denote by S the group of rotations of the circle in the plane. 
The elements of the group S correspond to points of the circle: if we associate with 
a real number ¢ the point of the circle with argument ¢g, then with any one point 
of the circle will be associated numbers that differ from one another by an integer 
multiple of 27. Therefore, this group S is frequently called the circle group. 

After choosing a certain integer m, let us associate with the point ¢ of the circle $ 
having argument g the number cosmg +i sinm@g, where i is the imaginary unit. It 
is obvious that adding an integer multiple of 27 to g does not change this number, 
which means that it is uniquely defined by the point t € S. Let us set 


Xm(t) =cosm@e+isinmg, m=0,+1,+2,.... (14.24) 


It is not difficult to verify that the function x, (t) is a character of the group S. For 
an infinite group such as S, it is natural to introduce into the definition of a character 
in addition to the requirement (14.23), the requirement that the function x(t) be 
continuous. The reason for such a requirement for the group S is as follows: it 
is necessary that the real and complex parts of the functions x,,(f) be continuous 
functions. 

It is possible to prove that the characters x,,(t) defined by formula (14.24) are 
continuous and that they comprise all the continuous characters of the circle. This 
explains to a large degree the role of the trigonometric functions cos mg and sinmg 
in mathematics: they are the real and imaginary parts of the continuous characters 
of the circle. 

Corollary 14.34 asserts that every function on a finite abelian group can be rep- 
resented as a linear combination of characters. In the case of an infinite group such 
as S, some analytic restrictions, which we shall not specify here, are naturally im- 
posed on such a function. We shall only mention the significance of functions on 
the group S. Such a function f(t) can be represented as a function F(@) of the 
argument ¢ of the point ¢ € S. It must not, however, depend on the choice of the ar- 
gument ¢ of the point f, that is, it must not change on the addition to @ of an integer 
multiple of 27. In other words, F(g) must be a periodic function with period 277. 


514 


14 Elements of Representation Theory 


The analogue of Corollary 14.34 for the group S asserts that such a function can be 
represented as a linear combination (in the given case, infinite) of functions xm(¢), 


m=0, 


1, 


2,....In other words, this is a theorem about the fact that a periodic 


function (with certain analytic restrictions) can be decomposed into a Fourier series. 


Historical Note 


Here we shall present a brief chronology of the appearance of the concepts discussed 
in this book. The development of mathematical ideas generally proceeds in such a 
way that some concepts gradually emerge from others. Therefore, it is generally 
impossible to fix accurately the appearance of some particular idea. We shall only 
point out the important milestones and, it goes without saying, shall do so only 
roughly. In particular, we shall limit our view to Western European mathematics. 

The principal stimulus was, of course, the creation of analytic geometry by Fer- 
mat and Descartes in the seventeenth century. This made it possible to specify points 
(on the line, in the plane, and in three-dimensional space) using numbers (one, two, 
or three), to specify curves and surfaces by equations, and to classify them accord- 
ing to the algebraic nature of their equations. In this regard, linear transformations 
were used frequently, especially by Euler, in the eighteenth century. 

Determinants (particularly as a symbolic apparatus for finding solutions of sys- 
tems of n linear equations in n unknowns) were considered by Leibniz in the sev- 
enteenth century (even if only in a private letter) and in detail by Gabriel Cramer 
in the eighteenth. It is of interest that they were constructed on the basis of the rule 
of “general expansion” of the determinant, that is, on the basis of the most complex 
(among those that we considered in Chap. 2) way of defining them. This definition 
was discovered “empirically,” that is, conjectured on the basis of the formulas for 
the solution of systems of linear equations in two and three unknowns. The broadest 
use of determinants occurred in the nineteenth century, especially in the work of 
Cauchy and Jacobi. 

The concept of “multidimensionality,” that is, the passage from one, two, and 
three coordinates to an arbitrary number, was stimulated by the development of 
mechanics, where one considered systems with an arbitrary number of degrees of 
freedom. The idea of extending geometric intuition and concepts to this case was 
developed systematically by Cayley and Grassmann in the nineteenth century. At 
the same time, it became clear that one must study quadrics in spaces of arbitrary 
dimension (Jacobi and Sylvester in the nineteenth century). In fact, this question had 
already been considered by Euler. 


LR. Shafarevich, A.O. Remizov, Linear Algebra and Geometry, 515 
DOI 10.1007/978-3-642-30994-6, © Springer-Verlag Berlin Heidelberg 2013 


516 Historical Note 


The study of concepts defined by a set of abstract axioms (groups, rings, algebras, 
fields) began as early as the nineteenth century in the work of Hamilton and Cayley, 
but it reached its full flowering in the twentieth century, chiefly in the schools of 
Emmy Noether and Emil Artin. 

The concept of a projective space was first investigated by Desargues and Pascal 
in the seventeenth century, but systematic work in this direction began only in the 
nineteenth century, beginning with the work of Poncelet. 

The axiomatic definition of vector spaces and Euclidean spaces as given in this 
book broke finally with the primacy of coordinates. It was first rigorously formulated 
almost simultaneously by Hermann Weyl] and John von Neumann. Both came to 
this from work on questions in physics. Then two versions of quantum mechanics 
were created: the “wave mechanics” of Schrédinger and the “matrix mechanics” of 
Heisenberg. It was necessary to work out that in some sense, they were “one and the 
same.” 

Both mathematicians developed an axiomatic theory of Euclidean spaces and 
vector spaces and showed that quantum-mechanical theories are connected with 
two isomorphic spaces. However, the difference between those theories and what 
we presented in this book lies in the fact that they worked with infinite-dimensional 
spaces. In any case, for finite-dimensional spaces, there appeared an invariant (that 
is, independent of the choice of coordinates) theory that by now has become univer- 
sally accepted. 

The introduction of the axiomatic approach in geometry was discussed in suffi- 
cient detail in Chap. 11, devoted to the hyperbolic geometry of Lobachevsky. Such 
studies began at the end of the nineteenth century, but their definitive influence in 
mathematics dates from the beginning of the twentieth century. The central figure 
here was Hilbert. For example, he contributed to the application of geometric intu- 
ition to many problems in analysis. 


References 


We recall first those books that were in vogue when the lectures on which this book 
is based were given. Many of these books have been reprinted, and we have tried to 
provide information on the latest available version.! 


1. I.M. Gelfand, Lectures on Linear Algebra (Dover, New York, 1989) 

2. A.G. Kurosh, Linear Equations from a Course of Higher Algebra (Oregon State 
University Press, Corvallis, 1969) 

3. FR. Gantmacher, The Theory of Matrices (American Mathematical Society, 
Chelsea, 1959) 

4. AI. Malcev, Foundations of Linear Algebra (Freeman, New York, 1963) 

5. P.R. Halmos, Finite-Dimensional Vector Spaces (Springer, New York, 1974) 

6. GE. Shilov, Mathematical Analysis: A Special Course (Pergamon, Elmsford, 
1965) 

7. O. Schreier, E. Sperner, Introduction to Modern Algebra and Matrix Theory, 
2nd edn. (Dover, New York, 2011) 

8. O. Schreier, E. Sperner, Einfiihrung in die analytische Geometrie und Algebra 
(Teubner, Leipzig, 1931) 


The book by Shilov is of particular interest for its large number of analytic applica- 
tions. The following books could also be recommended. However, the conciseness 
of their presentation and abstract approach put them far beyond the capacity of the 
average student. 


9. B.L. Van der Waerden, Algebra (Springer, New York, 2003) 
10. N. Bourbaki, Algebra I (Springer, Berlin, 1998) 
11. N. Bourbaki, Algebra II (Springer, Berlin, 2003) 


Since the lectures on which this book is based were given, so many books on the 
subject have appeared that we give here only a small sample. 


'Translator’s note: Wherever possible, English-language versions have been given. Some of these 
were written originally in English, while others are translations from original Russian or German 
sources. 


LR. Shafarevich, A.O. Remizov, Linear Algebra and Geometry, 517 
DOI 10.1007/978-3-642-30994-6, © Springer-Verlag Berlin Heidelberg 2013 


518 References 


12. E.B. Vinberg, A Course in Algebra (American Mathematical Society, Provi- 
dence, 2003) 

13. A.I. Kostrikin, Yu.I. Manin, Linear Algebra and Geometry (CRC Press, Boca 
Raton, 1989) 

14. AI. Kostrikin, Exercises in Algebra: A Collection of Exercises (CRC Press, 
Boca Raton, 1996) 

15. S. Lang, Algebra (Springer, 1992) 

16. M.M. Postnikov, Lectures in Geometry: Semester 2 (Mir, Moscow, 1982) 

17. D.K. Faddeev, Lectures on Algebra (Lan, St. Petersburg, 2005) (in Russian) 

18. R.A. Horn, C.R. Johnson, Matrix Analysis (Cambridge University Press, Cam- 
bridge 1990) 


With regard to applications to mechanics, see the book The Theory of Matrices by 
Gantmacher mentioned above as well as the following. 


19. KR. Gantmacher, Oscillation Matrices and Kernels and Small Vibrations of 
Mechanical Systems (American Mathematical Society, Providence, 2002) 


Relationships with differential geometry, which we briefly touched on in this course, 
are described, for example, in the following. 


20. A.S. Mishchenko, A.T. Fomenko, A Course of Differential Geometry and Topol- 
ogy (Mir, Moscow, 1988) 


In presenting Lobachevsky’s hyperbolic geometry, we have followed for the most 
part the following brochure. 


21. B.N. Delone, Elementary Proof of the Consistency of Hyperbolic Geometry 
(State Technical Press, Moscow, 1956) (in Russian) 


All the results concerning the foundations of geometry whose proofs we omitted are 
contained in the following books. 


22. N.Yu. Netsvetaev, A.D. Alexandrov, Geometry (Nauka, Fizmatlit, Moscow, 
1990) 
23. N.V. Efimov, Higher Geometry (Mir, Moscow, 1980) 


Facts about analytic geometry that were briefly mentioned in this course, such as 
the connection with the theory of quadrics, can be found in the following books. 


24. P. Dandelin, Mémoire sur l’hyperboloide de révolution, et sur les hexagones 
de Pascal et de M. Brianchon. Nouveaux mémoires de l’Académie Royale des 
Sciences et Belles-Lettres de Bruxelles, T. II (1826), pp. 3-16 

25. B.N. Delone, D.A. Raikov, Analytic Geometry (State Technical Press, Moscow- 
Leningrad, 1949) (in Russian) 

26. P.S. Alexandrov, Lectures in Analytic Geometry (Nauka, Fizmatlit, Moscow, 
1968) (in Russian) 

27. D. Hilbert, S. Cohn-Vossen, Geometry and the Imagination (AMS, Chelsea, 
1999) 

28. A.P. Veselov, E.V. Troitsky, Lectures in Analytic Geometry (Lan, St. Petersburg, 
2003) (in Russian) 


References 519 
Connections between the hyperbolic geometry of Lobachevsky and other branches 
of projective geometry are well described in the following book. 


29. F. Klein, Nicht-Euklidische Geometrie (Gottingen, 1893). Reprinted by AMS, 
Chelsea, 2000 


In connection with representation theory, the following book is to be recommended. 


30. J.-P. Serre, Linear Representations of Finite Groups (Springer, Berlin, 1977) 


Index 


A 
Affine ratio 
of three points, 298 
Affine subset (of a projective space), 323 
Affinely equivalent subsets, 307 
Algebra, 370 
exterior, 372 
graded, 372 
Angle 
between planes, 237 
between two lines or a line and a plane, 235 
between vectors, 215 
Annihilator, 124 
Associativity, xv, 63, 371, 467 
Axioms of plane geometry, 445 
parallel lines (in Euclidean and hyperbolic 
geometry), 448 


B 
Ball, 222 
Bases 
oriented, 155 
with the same orientation, 277 
Basis 
of a vector space, 89 
dual, 123 
orthonormal (in a Euclidean space), 218 
orthonormal (in a pseudo-Euclidean 
space), 266, 268 
orthonormal (with respect to a bilinear 
form), 401 
of an algebra, 371 
Blocks of a matrix, 65 


Cc 
Canonical equations (of a quadric), 422 
Canonical form (of a quadratic form), 201 


Center 

of a flag, 301, 442 

of a set, 419 
Central symmetry (of an affine space), 419 
Character, 511 

of the circle (continuous), 513 
Characteristic polynomial, 139 
Circle (group of rotations), 513 
Cofactor, 40, 379 
Combination 

linear, 87 
Commutativity, 473, 484 
Commuting matrices, 64 
Compactness, 341 
Complexification, 151 
Composition 

of linear transformations, 106 

of mappings, xiv 
Cone 

in an affine space, 421, 429 

light (isotropic), 269 
Conic, 392, 430 
Constant terms, 1 
Convergence, xviii, 179, 339 
Coordinates 

of a point, 291 

heterogeneous, 323 

of a vector, 90 

Pliicker (of a space), 351 

points 

homogeneous, 320 

Cramer’s rule, 43 
Curvature 

Gaussian, 265 

normal, 263 

principal, 264 
Cylinder, 303 


LR. Shafarevich, A.O. Remizov, Linear Algebra and Geometry, 521 
DOI 10.1007/978-3-642-30994-6, © Springer-Verlag Berlin Heidelberg 2013 


522 


D 
Deformation (continuous), xx, 158, 343 
Degree of a polynomial, 15, 127 
Delta function, 94, 359 
Determinant, 25, 29 

explicit formula, 53 

Gram, 217 

of a linear transformation, 112 

of a square matrix, 30 

Vandermonde, 41 
Diagonal (of a matrix), 2, 178 
Differential, 131, 293 
Dimension 

of a projective space, 320 

of a representation, 497 

of a vector space, 88 

of an affine space, 291 

of an algebra, 371 
Direct sum 

of representations, 502 

of subgroups, 475 

of submodules, 489 

of subspaces, 84 
Distance between points, 309 
Distributive property, 64, 107, 370 
Divisor 

greatest common (gcd), 487 

(of an element of a ring), 486 
Duality principle, 125, 392 


E 


Echelon form (systems of linear equations), 13 


Eigensubspace, 138 
Eigenvalue, 137 
Eigenvector, 137 
Element 
identity, 370 
inverse (right, left), 467 
negative, 474 
prime (of a ring), 486 
torsion (in a module), 488 
unit (in a ring), 486 
zero, 474 
Elementary row operations (on matrices), 7 
Elements 
associates (in a ring), 486 
homogeneous (in a graded algebra), 373 
relatively prime (in a ring), 487 
Ellipse, 430 
Ellipsoid, 428 
Endomorphism, 102 
Equivalence relation, xii 
Equivalent representations, 501 
Euclidean algorithm (in a ring), 487 


Index 


Exterior power 
mth exterior power (of a vector space), 360 


F 
Fiber of a projection, 303 
Field, 485 
of characteristic different from 2, 83, 196 
Flag, 101, 301, 441, 447 
Form, 127 
bilinear, 192 
antisymmetric, symmetric, 193 
nonsingular, 195 
Hermitian, 210 
quadratic, 191 
first, second (of a hypersurface), 262 
positive, negative definite, 205 
sesquilinear, 210 
Formula 
Cauchy—Binet, 377 
change of basis 
for the matrix of a bilinear form, 195 
change of coordinates of a vector, 109 
Euler, 264 
expansion of the determinant along a 
column, 40 
for a change of matrix of a linear 
transformation, 111 
Frame of reference, 291 
orthonormal, 310 
Free mobility (of an affine Euclidean space), 
317 
Function, xiii 
antisymmetric, 46 
exponential of a matrix, 181 
linear, 2 
multilinear, 51, 358 
quadratic Hermitian, 211 
semilinear, 209 
sesquilinear, 210 
symmetric, 44 


G 
Gaussian elimination, 6 
Geometry 
absolute, 448 
elliptic, 464 
projective, 319 
spherical, 462 
Grade (of a principal vector), 162 
Grassmannian, 356 
Group, 467 
abelian, 473 
alternating of degree n, 471 
commutative, 473 


Index 


Group (cont.) 
cyclic, 471 
symmetric of degree n, 469 
transformation, 468 


H 
Half-space, 99, 436 
Hexagon 
circumscribed about a conic, 393 
inscribed in a conic, 392 
Homeomorphism, xviii 
Homomorphism (of groups), 471 
Horizon, 324 
Hyperbola, 430 
Hyperboloid of one sheet, 398 
Hyperplane, 89, 294, 322, 435 
tangent, 261, 327, 386 
Hypersurface, 386 


I 
Identity 
Cauchy-—Binet, 68 
Euler’s, 130 
Image 
of a homomorphism, 472 
of a linear transformation, 115 
of a mapping, xili 
of an arbitrary mapping, xiii 
Incidence (points and lines), 319 
Index of inertia, 205, 266 
Inner product 
of vectors, 213, 435 
Interpolation, 15 
Inversion, 49 
Isometry, xxi 
Isomorphism 
of affine spaces, 303 
of Euclidean spaces, 223 
of groups, 472 
of vector spaces, 112 


J 

Jordan 
block, 169 
normal form, 169 


K 
Kernel 
of a homomorphism, 472 
of a linear transformation, 115 


L 
Law of inertia, 205 
Length of a vector, 215 


523 


Limit (of a sequence), xviii, 339 

Linear 
combination, 57 
part (of an affine transformation), 301 
substitution of variables, 62 


M 
Mapping, xiii 
dual, xv 
extension, Xili 
identity, xiii, 102 
perspective, 338 
Matrices 
commuting, 64 
equivalent, 203 
similar, 135 
Matrix, 2 
additive inverse, 60 
adjugate, 73 
antisymmetric, 54 
block, 65 
block-diagonal, 65, 137 
continuously deformable, 158 
diagonal, 74 
echelon form, 13 
Hermitian, 210 
identity, 34 
inverse, 72 
nonsingular, 37 
null, 60 
of a bilinear form, 192 
of a linear transformation, 105 
orthogonal, 225 
singular, 37 
square, 2 
symmetric, 54 
system of linear equations, 2 
transition, 109 
transpose, 53 
Metric, xvii, 309 
Minor, 31 
associates, 69 
leading principal, 206 
Mobius strip, 346 
Module (over a ring), 485 
finitely generated, 488 
Motion 
in the axioms of plane geometry, 445 
of a hyperbolic space, 437 
of an affine Euclidean space, 310 
Multiplication table (in an algebra), 371 
m-vector, 360 
decomposable, 367 


524 


N 
Newton sum, 209 
Null vector, 81 


O 
Operations 

in a group, 474 

in a ring, 484 

in an algebra, 370 
Operator, 102 

first-order differential, 129 
Order 

of a group, 468 

of an element of a group, 471 

of an element of a module, 489 
Orientation 

of a Euclidean space, 230 

of a pseudo-Euclidean space, 277 

of a vector space, 155 
Orthogonal complement, 198, 218, 503 
Orthonormal system of vectors, 218 


P 
Pair of half-spaces, 300 
Parabola, 430 
Parallel subspaces (in an affine space), 295 
Parallelepiped (spanned by vectors), 219 
Path (in a metric space), xx 
Path-connected component, xx 
Permutation, 45, 469 
even, 48 
Pliicker relations, 354 
Point 
at infinity, 319, 324 
critical, 253 
fixed, 305 
lying between two other points, 298, 445, 
450 
of a projective space, 320 
of an affine space, 289 
of hyperbolic space, 434 
singular 
of a hypersurface, 387 
of a projective algebraic variety, 327 
Points 
independent, 297, 331 
Poles (of the light cone), 271 
Polynomial, 15, 127, 293 
annihilator, 146, 147 
characteristic, 139 
homogeneous, 127 
in a linear transformation, 141 
matrix, 69 
minimal, 146 


Index 


Preimage, xiii 
Principal of duality, 326 
Product 
direct 
of subgroups, 474 
of a matrix by a number, 60 
of elements 
of a group, 467 
of an algebra, 370 
of matrices, 61 
of sets, Xvi 
of vectors 
exterior, 360, 368 
Projection, 103, 302 
orthogonal, 216, 219 
Projective 
cover, 325 
line, 320 
plane, 320 
Projectivization, 320 


Q 

Quadric, 385, 414 
nonsingular, 386, 429 

Quadrics 
affinely equivalent, 418 
metrically equivalent, 425 


R 
Radical (of a bilinear form), 198 
Rank 
of a bilinear form, 195 
of a linear transformation, 118 
of a matrix, 55 
Ratio 
of four points (cross, anharmonic), 337 
Rectilinear generatrices (of a hyperboloid), 
398 
Reflection (of a Euclidean space), 229 
Representation, 497 
identity, 499 
induced, 501 
infinite-dimensional, 499 
irreducible, reducible, 502 
regular, 508 
unitarizable, 504 
Representation space, 497 
Representations 
equivalent, 501 
Restriction (of a mapping), xiii 
Ring, 484 
commutative, 484 
Euclidean, 486 


Index 


Rotation of a Euclidean space about an axis, 
229 


Ny 
Segment, 299, 446 
Semiaxes (of an ellipsoid), 254, 428 
Set, xi 

centrally symmetric (in an affine space), 

419 

convex (in an affine space), 299 
Sets 

homeomorphic, xviii 
Solution of a system of linear equations, 4 
Space 

affine, 289 

affine Euclidean, 309 

dual, 121 

Euclidean, 213 

hyperbolic, 434 

metric, xvii 

Minkowski, 86, 268 

m-vectors, 360 

of a representation, 497 

of linear functions, 121 

of vectors of an affine space, 291 

projective, 320 

dual, 325 

pseudo-Euclidean, 268 

second dual, 123 

tangent, 261, 327, 386 

vector, 81 
Sphere, 222 
Stereographic projection, 343 
Subgroup, 468 

cyclic, 471 

maximal, 476 
Submodule, 488 

cyclic, 489 
Subspace 

cyclic, 162 


degenerate (of a pseudo-Euclidean space), 


266 
invariant 
(with respect to a linear 
transformation), 135 
(with respect to a representation), 501 
isotropic, 395 
linear span of vectors, 87 
nondegenerate (of a pseudo-Euclidean 
space), 266 
of a hyperbolic space, 435 
of a projective space, 322 
dual, 326 
of a vector space, 83 


of an affine space, 294 

solutions of a system of equations, 84 
Subspaces 

directed pair, 101 
Sum 

of matrices, 61 

of subspaces, 84 

direct, 84 

Superalgebra, 373 
Sylvester’s criterion, 206 
System of linear equations, 1 

associated, 11 

consistent, 5 

definite, indefinite, 5 

equivalent, 7 

homogeneous, 10 

inconsistent, 5 

(row) echelon form, 13 

uniquely determined, 5 

upper triangular form, 14 


T 
Theorem 
Bolzano—Weierstrass, 247 
Brianchon’s, 393 
Cayley—Hamilton, 147 
Courant—Fischer, 253 
Euler’s, 316 
Helmholtz—Lie, 443 
Laplace’s, 379 
Pascal’s, 392 
Rouché—Capelli, 56 
Torus, 414 
Transformation 
affine, 301 
linear, 306 
proper, improper, 307 
singular, nonsingular, 304 
antisymmetric, symmetric, 203, 245 
block-diagonalizable, 152 
diagonalizable, 139 
dual, 125 
linear, 102 
Lorentz, 276 
nonsingular, singular, 135 
null, 106 
of a vector space into itself, 133 
orthogonal, 224, 401 
projective, 328 
proper, improper, 276, 402 
singular, nonsingular, 111 
unitary, 255, 503 
Translation (of an affine space), 292 
Transpose 


526 


of a matrix, 53 

Transposition, 45 

Triangle, 446 

Triangle inequality (Cauchy—Schwarz), 310 
in hyperbolic geometry, 458 
in spherical geometry, 463 


U 
Universality (of the exterior product), 365 
Unknowns 

free, 13 

principal, 13 


v 

Variety 
Grassmann, 356 
projective algebraic, 322 


Index 


dual, 327 
irreducible, 409 


Vector, 79, 81 


principal, 161 


Vectors 


decomposable, 361 
eigen-, 137 

lightlike (isotropic), 269 
linearly dependent, 87 
linearly independent, 87 
orthogonal, 198, 217 
spacelike, 268 

timelike, 269 


Volume of a parallelepiped 


oriented, 221 
unoriented, 220 


